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Abstract. This paper presents a family of new methods for locating/fitting hyperplanes with respect to a given 
set of points. We introduce a general framework for a family of aggregation criteria of different distance-based 
errors. The most popular methods found in the specialized literature can be cast within this family as particular 
choices of the errors and the aggregation criteria. Mathematical programming formulations for these methods 
are stated and some interesting cases are analyzed. It is also proposed a new goodness of fitting index which 
extends the classical coefficient of determination. A series of illustrative examples and extensive computational 
experiments implemented in R are provided to show the performances of some of the proposed methods. 


1. Introduction 

The problem of locating hyperplanes with respect to a given set of point is well-known in Location Analysis 
m- This problem is closely related to another common question in Data Analysis: to study the behavior of 
a given set of data with respect to a fitting body expressed with an equation of the form f(X±,... ,Xd) = 0. 
This last problem reduces to the estimation of the ‘best’ function / that expresses the relationship between 
the provided data or in other words to the location of the surface f(X) = 0 that minimizes some aggregation 
function of the distances of the points (data set) to the dimensional facility / (see [T1 ITU H5] 1. In many cases and 
for the sake of simplicity, the family of functions where / belongs to is usually fixed and then, real parameters 
of such a function must be determined. The most widely used family of functions considered in this framework, 
probably because of its simplicity, is the family of linear functions, namely the above equation is of the form 
f(Xi, ...,X d )=fi 0 + ELiPkX k = 0 for A>,/3i,... ,fi d S R. 

To perform such a fitting, we are given a set of points {au,... ,£„} C R d , and one tries to find the values 
0 = (fio, ■ ■ ■ > fid) that minimize some measure of the deviation of the data with respect to the hyperplane 
H(0) = {z € R d : 0o + Yl=i PkZk = 0}. For a certain observation x G M. d in the data set, such a deviation is 
usually known as the residual (terminology borrowed from the Statistical Regression literature). In a general 
framework, for a given point x € R d , we define the residual of a model as a mapping e x : M d+1 -a M+, that 
maps any set of coefficients 0 = {fio, ■ ■ •, fid) £ R d+1 , into a measure e x {0) that represents the deviation of the 
given point x from the hyperplane with those parameters. The larger this measure, the worse the fitting for 
such a point x. The final goal of fitting an hyperplane for a given set of points {xi, ..., x n } C R d is to find the 
coefficients minimizing a globalizing function, $ : K n —> M, of the residuals of all the points. Equivalently, the 
fitting problem consists in locating a hyperplane minimizing the globalizing function <f> of the distances from 
the demand points to the hyperplane. Different choices for the residuals and the globalizing criteria will give, in 
general, different optimal values for the parameters and thus different properties for the resulting hyperplanes. 
This problem is not new and some of these fitting criteria, as the minisum, minimax and some other robust 
versions, have been analyzed from a Locational analysis perspective (see among other). 

The most natural approach to locate a hyperplane is to consider that residuals, with respect to given points, 
are individual measures of error and thus, each residual should be minimized independently of the remaining. 
Obviously, this approach gives rise to a multicriteria problem mm- It is clear that this simultaneous min¬ 
imization will not be possible in most of the cases and then several strategies can be followed: one can try to 
find the set of Pareto fitting curves m or alternatively, to apply an aggregation function that incorporates the 
holistic preference of the Decision-Maker on the different residuals. This last choice is very difficult and the 
usual approach is to apply the principle of complete uncertainty leading to additive aggregations. 

The most popular methods to compute the coefficients of an optimal hyperplane consider that the residuals 
are the differences from one of the coordinates of the space (which are usually known as vertical/horizontal 
distances). In this paper we present a new framework for optimally locating/fitting hyperplanes to a set of 
points that allows the decision-maker to decide within a wide family of residuals and criteria which is the “best” 
for a given sample of data. One of the main contributions of our proposal is the use of modern mathematical 
programming tools to solve the problems which are involved in the computation of the parameters of the fitting 
models. The optimization models for those problems range from continuous convex programming (CP) to mixed 
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integer nonlinear programming (MINLP) through linear programming (LP). Many of the formulations described 
in this paper have been implemented in R in order to be available for data analysts. 

The framework in this paper introduces a family of combinations residuals-criteria that allows a great flexibil¬ 
ity to accommodate hyperplanes to set of points mm- This new framework can be easily combined with some 
of the mathematical programming techniques for feature selection, to “choose” a fixed number of coordinates 
to explain the dependence between the different dimensions [7], with classification schemes (6j, or when the 
coefficients of the linear manifold are required to fulfill a set of linear equations/inequalities. This framework 
can also accommodate general forms of regularization, as upper bound on the £ 2 -norm of the coefficients m, 
since it would only mean to add additional constraints to the mathematical programming formulations pro¬ 
posed in the paper. The complexity of solving the resulting model depending on the difficulty of the considered 
regularization constraints. 

In order to compare the goodness of the fitting for the different models we have developed a new generalized 
measure of fit. This task becomes difficult when one tries to compare fitting hyperplanes which are built based on 
different paradigms and purposes. The new measure is provided in order to make meaningful comparisons. This 
proposal is based on a generalization of the classical coefficient of determination, that will allow to measure 
how good is an optimal hyperplane with respect to the best constant model, Xd = Pq. This measure will 
extend the standard coefficient of determination for least squares fitting. We also perform an extensive series 
of experiments to validate the application of our results applied with different objectives to several set of data. 

In our framework, errors are measured as shortest distances, based on a norm, between the given points and 
the fitting surface. This makes the location problem geometrically invariant which is an interesting advance 
with respect to vertical/horizontal residuals. Through the paper we observe that this framework also subsumes 
as particular cases the standard location methods that consider residuals based on vertical distances (commonly 
used in Statistics); as well as most of the particular cases of fitting linear bodies using vertical distances but 
different aggregation criteria described in the literature, as t v fitting (£ p -norm criterion), least quantile of squares 
[Mill], least trimmed sum of squares H2E1, etc. As previously mentioned, the problem of optimally locating 
an hyperplane with respect to a set of demand points is closely related to the estimation phase in multivariate 
linear regression, where several methods have already been proposed. However, the use of nonstandard residuals 
is not usual in the literature of regression analysis although orthogonal (£ 2 ) residuals have been already used, 
see e.g. Euclidean Fitting 0CE3IM] or Total Least Squares [451 , mainly applied to bidimensional data. Quoting 
the reasons for that fact given by Giloni and Padberg in [IS]: “we have left out a summary of linear regression 
models using the more general £ T ,-norms with r ^ {1,2, 00 } for which the computational requirements are 
considerably more burdensome than in the linear programming case (as they generally require methods from 
convex programming where machine computations are far more limited today).” 

The paper is organized as follows. In Section [2] we introduce the new framework for fitting hyperplanes as 
well as some results that allows to interpret the results for practical purposes. Next, a residual-aggregation 
dependent goodness of fitting index is defined and it is presented an efficient approach for its computation. 
Section [3] is devoted to the analysis of the classical location methods under the new framework, more precisely, 
mathematical programming models for adequate aggregation criteria and residuals are provided for: 1) least 
sum of squares; 2) least absolute deviation; 3) least quantile of squares and 4) least trimmed of squares fitting. 
In Sections [I] and [5] we present new methods for the location of hyperplanes assuming that the residuals are 
measured as the smallest norm-based distance between the given points (data set) and the linear fitting body 
using polyhedral norms (Section Q]) and i T norms (Section [5]), respectively. We also present, in Section [5j outer 
an inner approximations for solving the resulting MINLP problems for £ p -norms residuals. Finally, Section [6] is 
devoted to the computational experiments. We report results for synthetic data and for the classical data set 
given in M- 


2. A FLEXIBLE METHODOLOGY FOR THE LOCATION OF HYPERPLANES 

Given is a set of of n observations or demand points (depending that we use the jergon of data analysis 
or location analysis, respectively) in a (d + l)-dimensional space, {x \,..., x n } C R d+1 (we will assume, for 
a clearer description of the models, that the first, the 0 — th, component of aq is the one that account for 
the intercept in the model, being Xiq = ■ ■ ■ = x n o = 1). Next, we analyze ways of fitting these observations 
to a linear form (hyperplane). For any y £ R d+1 , we shall denote y_o = (yi, ■ ■ ■ ,y<i)> he- the vector with 
the last d coordinates of y excluding the first one. We consider here a flexible framework for the problem 
of locating/fitting hyperplanes that includes as special cases the classical and most modern models found in 
the specialized literature. First, we assume that the point-to-hyperplane deviation is modelled by a residual 
mapping e x : R d+1 —► M+, £ X (P) = D(x_o,%(/3)), being D a distance measure in R d . This residual represents 
how “far” is the point (observation) x £ R d+1 with respect to the hyperplane %(/3) = {y £ R d : (1, y 4 )/3 = 0} 
(Some times we will write the hyperplane as (3 t X = 0, with (3 = (po, Pi,..., PdY £ R d+1 .) 

Furthermore, the residuals for each demand point are aggregated using a globalizing function $ : R” —>• R, 
which for a set of residuals £\,...,e n gives an overall measure of the deviations of the whole data set with 
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respect to the hyperplane. With this setting, ones tries to minimize such a globalizing measure of the residuals 
with respect to all the given demand points. 

With this notation, the Fitting Hyperplane Problem (FHP) consists in finding (3 £ R d+1 such that: 

(FHP ($,£)) /3 £ arg min 3>(e x (/3)), 


where £ x (f3) = (e Xl (/3), ■ ■ ■ ; £ x„(/3)) t is the vector of residuals. 

Note that the difficulty of solving FHP(<f>, e) depends of the expressions for the residuals and the aggregation 
criterion <f>. If $ and e x are linear, the above problem becomes a linear programming problem. In this paper, we 
consider a general family of aggregation criteria that includes as particular cases most of the classical ones used 
in the literature. Some of those criteria have been already considered for the sake of outlier detection 12209 
or as robust alternatives to the standard linear regression approach Bill- 

Let Ai,..., A„ £ R and let e £ R” be the vector of residuals of all of the demand points in the given data 
set. We consider aggregation criteria $ : R™ —> R + defined as: 


(!) $ ( £ ) = X! A * £ W 

i=l 

where £(,) £ {ei,... ,£ n } is such that £(i) < • • ■ < £(„). Observe that this operator defines a multiparametric 
family (called ordered median functions [32] ) that depending on the choice of the A-weights captures many of 
the models proposed in the literature. 

Note that the above shape of $ is symmetric and, for non negative lambda weights, a monotone function that 
ensures that the ordering of the individual residuals do not affect the overall goodness of the fitting. Moreover, 
it also implies that a componentwise smaller vector of residuals gives rise to a more accurate fitting. 

The natural implication of the assumption made about the definition of residuals is that, as expected, the 
response (projection) of a demand points on a given hyperplane differs from the classical evaluation and it must 
be the closest point, with respect to the distance D, in the located hyperplane "H(/3). 


Lemma 1. For a given point z t = (1, z \,..., Zd) and the hyperplane H(f.3) the response z consistent with the 
residual £ z = min y ^.u{p) H-Z-o — y|| is given by 

frz 1 ta\ 

= z -“ “ pj kMi 

where || ■ ||* is the dual norm to || • |j and k(/3) = arg max PYqX. Moreover, 

IMM 


( 2 ) 


= 


1/3^1 

ii/3-oir 


Proof. The proof follows applying [24] Theorem 2.1] to the definition of residual £ z = min y en(P) \\ z -o ~y ||- D 


From the above result, the response for a point with a unknown coordinate (w.l.o.g, the last component, d), 
namely z = (1, z±, ..., Zd-i, O) 4 , will be given by: 

ptz MP) d . 


Zd = 


ll/3-a 


Hence, differentiating z with respect to each Zj, j = 1,..., d — 1, we get 

dz d _ fa 


dz. 


ll/3_o || 


-k(/3) 


di 


which may be interpreted as the marginal variation of the d-th coordinate with respect to j-th coordinate 
whenever the other dimensions remain constant. 

Explicit expressions for such projections, namely, t\, £00 and tb-norms, for r > 1 are described in the following 
lemma. 


Lemma 2. Let z = (1, z±,..., ZdY, then 
(1) IfD is the t\- distance, 


Zk 


Zk 

(3 t z 

Zk ~ ll/3„olU Wfc > 
Zk + ll/olloo^’ 


if \Pk\ Y m ax{|/3jj : j = 1,..., d}, 
if Pk = max{\Pj\ : j = 1,... ,d}, 
if Pk = — max{|/3jj : j = 1 ,... ,d}, 


for k = 1,..., d, and for some Vi ,..., Vd > 0 such that Vj = 1. 
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(2) If D is the loo- distance, 


Zk = 


Zk ~ 
Zk + 


ll/3-olli’ 


P-olU 

(3) If D is the t T - distance with 1 < r < +oo then 


if Pk > 0, 
if Pk < 0, 


k = 1 ,,d. 


z k = z k - 




ll/3-i 


-kr (/3)fc, —1,..., c? 


0 II ^ 


and 


f sign03 fc )|j3 fc r /T 

kr(/3)fe = < (E^TWk) 177 

0 


if Pk ^ ® 
if Pk = 0 , 


k = 1,... ,d, 


The proof of item 3. follows from the Lagrangian 


6emg u such that F + - = 1. 

Proof. The proof of items 1. and 2. can be found in 
optimality condition applied to ^max P-o z - First, we observe that a Lagrange multiplier exists since the 

problem is regular at any point of the i T unit ball. Next, the Lagrangian function is L(z, A) = (3_ 0 z — 
^Sfc=i \ z kV■ Therefore, its partial derivatives are: = Pk — Ar| 2 fc| T_1 sign( 2 :fc), for all k = 1,... ,d. Hence, 

equating to zero the partial derivative, it follows that for any index k such that 0 

Pk 


(3) 


A* = 




—sign(4). 


Let us define the sets I = {k : Pk > 0}, J = {k : Pk < 0}, K = {k : Pk = 0}. Now from equation ©, and 
taking into account that ||z|| T = 1, we obtain: 

f (sign {-tpIkV 


= { (E- = iSign(z*)/3 3 - 

0 


if k € I U J 
otherwise. 


*r 2 . This 


Moreover, the hessian of L is diagonal and all its entries are negative, namely ^4 = —At(t — 1)| z t 
implies that z* and A* are local maxima. 

In the particular case of r = 2 then one can check that (/3)fc = Pk which simplifies the above expression. 


□ 


(4) 


We note in passing that e x = D||.|| (x_o, 'H(P)) an d thus, according to the Lemma[T] 

\f3 t x\ 


D|j.||(a;_o,H) = 


ll/3-o 


Observe also that when the demand points in the data set lie exactly on the hyperplane H all the proposed 


methods FHP(<f>, e) determine the same hyperplane P as an optimal fitting, for any norm-based residuals while 
using vertical distance residuals will never produce hyperplanes in the form P. = {z G R d : Po + P\Z\ + • • • + 
Pd-iZd-i = 0} since the “traditional” methods do not allow zero coefficients for the dependent coordinate. Note 
also that the vertical distance based methods assume that errors are present only in one of the components 
(the so-called dependent), so the rest of the variables should be error-free. In the proposed general framework, 
this is no longer assumed since there is no distinction between dependent and independent variables for the 
location/fitting procedure, so errors may be considered in all the components of the points in the given data 
set. 

Remark that the standard residual (vertical distance) is a distance measure that is not induced by a norm, but 
its expression can be written in a analogous form and so it fits to the shape of the distances that are considered 
in this paper. In particular, the vertical distance (with respect to the last coordinate) may be defined as: 

d—l 

Pd^C’d ^ ^ Pi%i / 

i=l 


(5) 


D v (x,H) = 


The above aggregation criteria m and residual functions © are rather general and exhibit good structural 
properties. On the one hand, they accommodate most of the already considered fitting methods in the literature. 
On the other hand, one can always exploit its properties and different representations in order to solve the 


optimization problem FHP(<f>, e) In the following we prove some structural properties that imply some sources 


of solvability of the problem on hands. 

For the sake of completeness, we recall the concept of difference of convex (D.C.) function. A function 
/ : K. d —>• R is said to be a D.C. if there exist g, h : —>• R convex functions such that / can be decomposed as 
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the difference between g and h. Optimization problems where the objective function and/or the constraints are 
defined by D.C. functions are called D.C. programming problems and they play an important role in nonconvex 
optimization because of its theoretical aspects as well as its wide range of applications (see [H]). 

Lemma 3. The globalizing function <&(e x (I 3)) is a D.C. function. 

Proof. In order to prove that the function $ is D.C. we will find a convenient representation where we can apply 
properties of the algebra of D.C. functions. To this for, we introduce the functions: 

<M/3) : = min { max{e Xii (/3) p ,..., e Xir (/ 3) p : i x < i 2 < .. ■ < i r , Vii, i 2 , ■ ■ ■, i r }}, 
for r = 1,... ,n, where e x (J3) = D||.||(x_ 0 , TL)- 

It is a simple observation that <p r {(3) coincides with the p-power r-th residual sorted in non-decreasing 
sequence, namely <p r {0) = e X(r) (/3) p for all r = 1,... , n. Hence, we get that 4>(£ x (/3)) = X^iLi Aj<^ r (/3). 

To finish the proof it suffices to prove that each function ip r is D.C. since linear combinations of D.C. functions 
are D.C.. Next, we start analyzing the residual function e x ((3) = d(x,TL(f3)). Assuming that d is a norm based 
distance given in the form of 0 or one can use those expressions to conclude that for each observation x, 
£ x {f3) is D.C. function of (3. Raising to the power p with p > 1 is also D.C., since it is the result of composing 
with a convex function (observe that residuals are non-negative). Finally, the operations of taking maxima 
and minima of D.C. functions are closed within this family m- This proves that ip r is D.C. for all r and this 
concludes the proof. □ 


We note in passing that the D.C. character of our globalizing criterion allows the application of all the 
available results on the optimization of this class of functions (see e.g. [44]). In spite of that, we can give more 
efficient representations that may help latter in the resolution of particular hyperplanes. These representations 
are based on simpler functions which replace tp by more friendly classes of functions (with regards to the 
optimization phase). 

Proposition 4. The globalizing function $(£ x (/3)) := (@) P + J2r= 2 (^r ~ A r _i)0 r (/3), where 9 r (f3) = 

max { £ x ()3) p + . .. + £ Xi (l3) p ■ c V r = 2,... ,n. (The reader may observe that the 

l 1 r *1 < *2 < • • • < J 

functions 9 r are usually called r — centrum in the specialized literature of optimization mb) 

Proof. This representation follows from the combination of the result in Lemma [3] and Theorem 3.6]. □ 


The following result states a mathematical programming formulation for the generalized fitting hyperplane 
problem, for any choice of $ and £ x . 

V 

Theorem 5. Let faq,..., x n } C R d+1 be a set of demand points, A G IPf, p = - G Q and II - || a norm in R d . 

_ s 

The Problem FHP(<I>,£) is equivalent to the following mathematical programming problem: 


n 


(LR$,||-||) 

min ^ X 3 9.j 





3 =i 




(6) 

st £ > l/3 ‘ Xi| 

iwr 

Mi = 

i) ■ ■ 

., n. 

(7) 

Zi <9j+M( 1 - wij), 

< 

Vo. 

II 

i) ■ ■ 

., n. 

(8) 

Z s > E r 

~l — C 7 1 

Mi = 

i, • • 

., n. 

(9) 

J2 w ij = i. 

Vj = 

i, • • 

., n. 


i=l 




(10) 

J2 w ij = i» 

Mi = 

ij • • 

., n. 


3= 1 




(11) 

@3 — @3 — 1) 

Vj = 

2,.. 

., n. 


Wij G {0,1}, 
z,^I^,/ 3 6R w . 

Mi = 

1) ■ ■ 

., n. 


Note that the above problem is a mixed integer non linear programming problem, whose continuous relaxation 
is in general non convex due to the constraints [Gj Apart from the mathematical programming formulation above, 
one may use alternative (in some cases better) formulations for the ordering problems as those provided in [17] , 
In particular, some important special ordered median aggregation criteria allow to have a simpler formulation 
that avoids the use of binary variables. The following result shows a better formulation for the fitting problem 
under the assumption that 0 < Ai < ... < A n . We call this setting for lambda the monotone case. 
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Theorem 6. Let {x±,... ,x n } C R d+1 be a set of demand points, A G K n , such that 0 < X± < • ■ ■ < \ n , 

y* _ 

p = - G Q with r > s G N, gcd(r, s) = 1 and || • || a norm in R d . Then, FHP($, e) is equivalent to the following 
mathematical programming problem: 


min 53 v i + 53 w i 

3 =1 *=1 

S.f. ©, ©, 

Vj +Wi> XiZj,\/i,j = 1, ... ,n, 
Zi,9i > 0, v,w G R”,/3 G K d+1 . 


Proof. The proof follows by the representation of the ordering between the residuals by permutation variables, 
which for 0 < Ai < • • • < A„, allows to write the objective function in FHP(<f>,e) as an assignment problem 


which is totally unimodular, so it can be equivalently rewritten using its dual problem. The interested reader 
is refereed to [5] for further details on this transformation. □ 


The reader may observe that, based on an alternative representation, the nonlinear constraints z® > for all 
i = 1,..., n can be transformed into a set of second order cone constraints using the following result which is a 
simplified version of Lemma 1 in [5]. This implies that those constraints can be efficiently handled by nowadays 
nonlinear solvers since they are convex and friendly for the optimization. 


Lemma 7. Let r,s€ N\ {0} with gcd(r, s) = 1, and k = [log 2 (r)J. Then, there exist variables Ui, ..., Uk-i > 0 
such that each constraint z s > s r in LR, t > m 


can be equivalently written as constraints in the form: 


u 2 < z bj e Cj , 

£ 2 < Uhuff^^^e 911 , 

Uj > 0 

with j = 1,..., k — 1 and such that 1 < aj + bj + Cj < 2 for given aj, bj , Cj G Z_j_ and dh, fh, gh G Z + such that 
dh ~\~ bh -\- Ch = 1 • 


By the above lemma, the nonlinear constraints in the form z s > e r are written as second order cone constraints 
in the form X 2 < YZ or X 2 < Y (for some choices of the variables X, Y and Z in our model). These constraints 
are then equivalent to one of the following two semidefinite constraints: 


( Y + Z 

0 

2X \ 

( Y 

0 

2X \ 

° 

Y + Z 

Y-Z >- 0, y + Z > 0 or 

0 

Y 

Y 

\ 2X 

Y-Z 

Y + Z ) ' 

V 2X 

Y 

Y J 


h 0, Y > 0. 


Hence, the difficulty of solving Problem LR$ ii.ii depends essentially on the choice of the residuals since all 


except constraints (EJ) are linear or second order cone constraints which can be efficiently handled with nowadays 
modern optimization techniques. In the next sections we analyze different choices of the residuals. 


Remark 8 (Subset Selection and Regularization). In the case where the number of points (n) is much smaller 
than the dimension of the space (d), it is common in Statistics to compute fitting hyperplanes over a smaller 
dimensional space. The new space is determined by those components that, after projecting, allows a good 
fitting when it is compared to the dimension of the new space. Several methods have been proposed in the recent 
literature to perform such a computation. If the dimension of the new space, q < d, is given, a constraint in 
the form ||/3_o||o < q (here || • ||o stands for the support function or nuclear norm, i.e., the number of nonzero 
components of the vector) may be included in the mathematical programming formulation (see |27l [8],), which 
gives rise to the so called Subset Selection Problem. If such a dimension is not known, regularization methods 
that penalize the number of nonzero elements or the size of /3_o can be applied to solve the Feature Selection 
Problem (see G2U- Note that both types of approaches can be easily incorporated in our models. 

2.1. Goodness of Fitting. After addressing the problem of locating/fitting a hyperplane with respect to a 
set of points, we will analyze the goodness of this fitting extending the well known coefficient of determination 
in Regression Analysis. For the sake of presentation, we assume that the variable that needs to be analyzed in 
terms of dependence to the others is the last coordinate Xd, or in other words Y = Xd- The goodness of fitting 
index is defined as: 

GCoD$ e = 1-, 

*5 


< 2-1 

when it is additionally required that (3 is in the form (3 = (/?o, 0,..., 0, — 1), i.e. the hyperplane is imposed 
to be constant (Xd = (3o)- Note that the components 1 ,... ,d — 1 do not appear in the model. Hence, $2 


where is the optimal value of (FHP(<f>, e)), namely $(e x ([3)), and ' s the optimal value of FHP(>I>,£) 
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measures the global error assumed by the best fitting “vertical” hyperplane; whereas GCoD$ j£ measures the 
improvement of the model that considers all the dimensions with respect to the one that omits all (except one) 
of them . Observe that this coefficient coincides with the classical coefficient of determination provided that the 
aggregation criteria is the overall sum and the residuals are the squared vertical distances: in that case A) = x.d 
(the sample mean of the dependent variable). 

The GCoD clearly verifies one of the important properties of the standard coefficient of determination, 
0 < GCoD$ £ < 1. Furthermore, one may interpret the coefficient as a measure of how good is the best possible 
hyperplane under certain criterion and residual choice with respect to the best horizontal hyperplane. When 
GCoD is close to 0, it is because ~ $q> so not appreciable improvement is given by the complete model 
(which considers all the components) with respect to the simple constant model; whenever GCoD is close to 
1, it means that d>* -C <Fq, being the proposed model significatively better than the constant model (note that 
GCoD = 1 iff <f>* = 0, i.e., when the model perfectly fits the demand points). Hence, the closer the GCoD to 
one, the better the fitting; whereas the closer to zero, the better is the constant model with respect to the full 
model. 

Observe that the above definition coincides with some of the choices to measure the goodness of fitting for 
robust alternatives to the least sum of squares methodology (see [25]). 


To obtain the GCoD, apart from solving FHP($, e) to get $*, we must also solve the problem: 

(12) $5 = min $(D(aq, H 0 ), ..., D {x n ,H 0 )), 

(3q(zM 

where Ho = {y £ R d : yd = A)} for some A) £ R. 

Lemma 9. If the residual mapping e x : R rf+1 —> R + is induced by a norm || • ||. Then, Problem \12\) is equivalent 
to 

(LRP$ ) = min $(n e \xi d - A)|, • • •, « e | Vnd - A)|), 

PoGM 

where 

1 

He = 


max Zd 

z£R d :\\z\\<l 

Proof. For the point x k in the data set, the residual under the assumption Xd = (3 0 is £x k {Po) = V(x k ,H 0 ) = 
muiyg-Ho ||xfc — y\\, where Hq = {y £ R d : yd = A)} for some A) G R. Then, by © in Lemma [T] 

| x k d - A) I 


,(A>) = 


11 ( 0 ,.-., 0,-1 


with || • ||* the dual norm of 
definition to y = (0,..., 0, —1) the result follows. 


. By definition of the dual norm ||y||* = max z t y. Hence, applying such a 

2eR d :||z||<l 


□ 


From the above result it is easy to see that K e = 1 , provided that e x is induced by any I p norm, even for the 
l\ and the loo cases. However, as we will see in Section U not all the norms have the same n e constant. 


Next, with our specifications for <!>, given by FHP(<f>, e)j the problem to be solved to obtain $g is: 


(LRP°,p) 

where e* = | Xid — A) I fo r i = 1 ,. .., n. 
Solutions to Problem LRP 


= *« min hM ■= 2 L. A * 

i= 1 


A,p 


for a given ft motivate the introduction of the concept of ordered median 


LRP", P 


point. Indeed, A) is a (A ,p)-ordered median point ((A,p)-omp in short) if it is an optimal solution to 

Some special cases of (A,p)-omp are well-known and widely used in the so-called location analysis literature. 
If A, = 1 for all i = 1,... ,n, the (A, l)-omp is known to coincide with the median, median^i^,... ,x n d), of 
{x\d ,..., x n d}', while the (A, 2)-omp is the arithmetic mean of the rr.d-values. 

In the general case, i.e. for arbitrary A and p , the ordered median points do not have closed form expressions 
mum, although they have been around in the field of Location Analysis for several years [3T| 152] - Moreover, 
they can be obtained, as shown below, t o be used in the computation of the goodness of fitting index. 

In the following we show how to solve LRP° for general choices of non-negative vectors A and p £ [1, + 00 ). 
Without loss of generality we assume that x\d < x-id < ... < x n d- Let us denote further by cqfc := Xid + Xkd the 
solution of the equation e\(/3) = e^(/9) for all * < fc, i, k = 1,..., n in the range (aq d, x n d). Let A be the set 
containing all the x.d and a points and denote by z k the fc-th point in A sorted in non-decreasing sequence. By 
construction, in the interval I k = (z k , z k + 1 ) all the functions ef(/3) are monotone for all * = 1,..., n. 

Lemma 10. The function f\ lP ((3) has at most one critical point (3* £ I k . 

Proof. For all (3 £ I k , the function f\. p is a non-negative linear combination of monotone functions. Therefore, 
its derivative can vanish in at most one point. □ 
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Let us denote by A c the set of all the critical points of the function f\ jP in the interval (x\d,x n d)- Observe 
that the cardinality of this set is 0{n 2 ). 

Theorem 11. For any non-negative vector A and p £ [1, oo) the set A U A c always contains a ( \,p)-omp. 

The reader may observe that the implication of the above theorem is that the /3 0 value can be always 
obtained by a simple enumeration of the set AU A c . Then, <f>g = n e X^"=i ^i\ x id ~ Thus, the complexity 

of computing GCoD is essentially the same as the resolution of Problem FHP(<f>,£) which must be solved to 
obtain <E>*. 


3. Classical Methods under the new framework 


In this section we show how several classical models of fitting with hyperplanes can be cast into our general 
framework. We assume that we are given a set of points {x ±,..., x n } C R d+1 . In classical models in the 
literature, the residuals are defined as the vertical distance (with respect to the last coordinate) from the point 
to the hyperplane: 


(13) 


m = 


Xd - 


d -1 

E 

fc =0 


Pk 

TTXk 


Therefore, the difference between the considered models comes from the choice of the globalizing criterion $ that 
aggregates the residuals. We have pointed out in the previous section that an important factor, in determining 
the difficulty of solving the mathematical programming problems for the fitting model, is the choice of the 
residual. This element influences much more the difficulty of the problem than the globalizing criterion. We 
shall show in this section how to handle, within this framework, the following 4 well-regarded models: Least 
Sum of Squares (LSS), Least Sum of Absolute Deviation (LAD), Least Quantile of Squares (LQS) and Least 
Trimmed Sum of Squares (LTS). These four well-known models are presented below as particular cases of our 


general framework described in FHP(<f>, e) 


A particularity of the models where the residuals are measured as the vertical distance between the point and 
the hyperplane, is that the response for a given data z coincides with z = Zd — which is the direct 

Pd 

evaluation of z over the linear function that defines the fitted hyperplane. This property will not be valid, in 
general, for residuals different from the vertical distance. 


3.1. Least Sum of Squares fitting problem. We start our analysis with the LSS method, credited to Gauss. 
It is the most widely used approach to estimate the coefficients of a linear model because its simplicity and its 
theoretical implications for the inference over the total population. However, somehow restricting hypotheses 
are required in order to be applied (see e.g. HSI). 

The LSS criterion is defined as the sum of the squares of the residuals, that is: 

n 

$LSs(ei, ■■■An) =E e ?’ 

i= 1 

where the residuals Ek are given by (USD- 

In case n > d, assuming without loss of generality that fid = 1, and that the given points are linearly 
independent, the optimality conditions of the problem allow to compute the best LSS parameters as: 

(3 = (X t X)~ 1 X t y 

where X is the n x d-matrix obtained from the sample data by columns and y l = ( Xid , ■ ■ • ,x n d) £ R ra are 
the responses of the last component of the model. Hence, the complexity of computing the parameters under 
the LSS method is 0(nd 2 ) which results from the complexity of multiplying n x d matrices. However, even 
though there is a closed form formula, it may appear numerical errors when computing the inverse of the 
matrix A'*A' if the rows of A' are linearly dependent or close to the linear dependence. Alternatively, one 
can compute the parameter /3, regardless of the degree of dependence of the variables in the model by solving 
either a quadratic programming or a second order cone programming problem; which is nowadays doable with 
on-the-shell software. 


Theorem 12. An optimal parameter (3* £ that minimizes &lss can be 
following two problems: 

n 

(LSSqp) min ^zf, 

i =1 


obtained by solving any of the 


(LSSsocp) 


d -1 


E %id ^ ^ fik^ik /^Cb 


fc= 1 


/3eR d ,zeMI[. 


(14) 
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< 2—1 


< 2—1 


Proof. Denote by = x id - /3 0 - ^ PkXik and by Wi = I x id - ^ PkXik - Po ) , for i = 1,..., n. Note that 

_ fc=i _ V 

the objective functions in ( LSSqp[ ) and ( |LSSsocpI ) coincide: 


k—1 


n n d—1 n 

} ^ = ^ ^ {%id ^ ^ fik%ik A)) = ^ ^ ^2 

2 = 1 2=1 k= 1 2=1 

Next, the minimization character of the objective function allows us to relax the equality constraint definition 
of the auxiliary variables to >-constraints and then the result follows. □ 


The reader may observe that LSS corresponds to FHP(<f>,e) with A* 
distance. 


(1,..., 1), p = 2 and e the vertical 


3.2. Least Absolute Deviation fitting problem. Another well-explored choice of residuals and criterion 
is the so called LAD method, introduced by Edgeworth in 1887. The globalizing criterion is the sum of the 
absolute value of the vertical residuals: 


n 

®lad{£ 1 , | e »|. 

i=l 


Note that LAD corresponds to the model FHP(>1>, e) for with A* = (1,..., 1) and p = 1. The optimal coefficients 
obtained with this method are known to be more robust than those by the LSS method. It follows that the 
mathematical programming model to be solved under this choice is: 


(15) 


n 

min > 

l 3eR d +i 

2=1 


< 2-1 

Xid fto ^ ) fikXik 

k=l 


(assuming w.l.o.g. that f3 d = 1). 

Observe that the above problem to compute the best LAD hyperplane can be actually formulated as a linear 
programming problem by replacing in ( |LSSsocpD the quadratic constraints by those which model the absolute 
value. 


3.3. Least Quantile of Squares fitting problem. Next, we describe another method known as Least Quan¬ 
tile of Squares, recently introduced by Bertsimas and Mazumder (7], which is a generalization of the Least 
Median of Squares (LMS) introduced by Hampel (1975). It also considers vertical distances as residuals, but 
the residuals are aggregated to minimize the r-quantile of the distribution of residuals (r can range in {1,..., n}). 

$lqs(£i, ••■,£») = r-quantile^,..., 4) := ef r) . 

which also fits to the general form of the aggregating criteria considered in this paper. In this case, following the 

(r—1) ( n—r ) 

notation introduced in Q. the LQS hyperplane can be obtained forp = 2 and A = (0,..., 0,1, 0,..., 0). (Observe 

LiJ If J 

that LMS hyperplane is also obtained within the same scheme when p = 2 and A = (0,..., 0, 1, 0,..., 0).) 

Theorem 13. An optimal parameter (3* £ R d+1 for LMS method can be obtained by solving the following 
problem: 

(LMSip) min 9 r 

s.t. 

(3 eR d ,z,9 £ Wf,Wij £ {0,1}, Vz, j = 1,... ,n, 

3.4. Least Trimmed Sum of Squares fitting problem. Finally, we present analogous formulations for the 
LTS method. This method was introduced by Rousseeuw m as a very robust alternative to the LSS method, 
in that it has a high breakdown point. With our notation, the residuals are again considered as the vertical 
distance, p = 2 but the aggregation criterion is now: 

h 

^LTsi^l-) • • • 5 ^-n) ^ ^ 

2=1 

where e^ G {si,..., £ n } with £^ < £(i+i) for i = 1,..., n — 1, and h G {1,..., n}. Note that in this problem 
one tries to minimize the sum of the h smallest squared residuals, discarding the remaining, and then, adjusting 

Tl 

the model to the h closest points. The most common choice for h is J, considering the best 50% square 

residuals to compute the hyperplane (thus, discarding the other 50% of the data). The choice of h allows to 
control which part of the data set are sacrificed to find a better hyperplane. We denote by LTS(a) the LTS 













10 


VICTOR BLANCO, JUSTO PUERTO, AND ROMAN SALMERON 


Method 

Line 

GCoD 

LSS 

y=-0.)133 x + 6.7934 

0.0442 

LAD 

y= -0.6931 x + 8.1492 

0.0065 

LMS 

y = 4x -127.6 

0.0765 

LTS(25) 

y= 4.0767 x -12.8668 

0.7328 

LTS (50) 

y =4-2105 x -13.6231 

0.6057 

LTS (75) 

y= 3.1176 x -8.8461 

0.7702 

LTS (90) 

y = 2.6620 x -6.8016 

0.6751 



Figure 1. Optimal Lines with the classical methods for the stars data set. 


method when 100 — a% of the data is discarded, i.e., the percentage of the data that may be considered as 
outliers. 

A suitable mathematical programming formulation for the LTS(a) method is stated in the following result. 

Theorem 14. An optimal parameter (3* £ R“ +1 for LTS (a) method can be obtained by solving the following 
problem: 

[an] 

(LTS(a)ip) min 9j 

i=l 

s.t. dzD, (ED — <HU), (HU), 

(3 £ R d+1 , z, 9 £ R B , Wij £ {0,1}, Vi, j = 1,..., n. 

We illustrate the differences of the above classical models in a well-known data set that appears in [23. The 
algorithms were implemented in R with the Gurobi callable library. 

Example 15. The data considered in this example consists of 47 points in R 2 about stars of the CYG OBI 
cluster in the direction of Cygnus [42]. The first coordinate, X\, is the logarithm of the effective temperature 
at the surface of the star and the second one, X 2 , is the logarithm of its light intensity. This data set has also 
been analyzed in [23 and [12j; among others. 

We run the LSS, LAD, LMS and LTS(a) with a £ {25,50,75,90}. The obtained lines and the goodness of 
fitting (GCoD$ e { are shown in Figure[l\ 

Observe that the LSS and LAD models were not able to adequately fit the the data while the others (which 
are somehow similar) show their better performance against the outliers. Note also that GCoD reflects this fact, 
although it is not clear whether LTS(75) (the one with the largest GCoD} is better than the others. 

In order to show the behavior of the LTS models and which are the results of their optimal fitting lines, 
Figure Q] shows the fitting lines that minimize the 25%, 50%, 15% or 90% of the residuals and the points that 
the corresponding optimization problems discard (filled dots in the subfigures) to reach the fitted lines. 

Apart from the classical models described above, the standard vertical distance residuals may be aggregated 
using a general $ function as those introduced in © providing a wide family of new methods to compute 
the coefficient of the best fitting hyperplanes. Also, linear constrained versions of the above methods may be 
considered by adding the adequate constraints to the corresponding formulations. Furthermore, many other 
alternative methods that use vertical distance residuals as MINSADBED or convex combinations of LSS and 
LAD methods [2] can easily be cast into our modelling framework. The formulations that allow solving those 
problems are rather similar to those already presented in this section and therefore are left for the interested 
reader. 


4. Fitting Hyperplanes with block-norm residuals 

In this section we present models to compute the parameters of fitting hyperplane when the distance point- 
to-hyperplane is assumed to be a block-norm distance between the point and the closest point in the hyperplane; 
and the aggregation criterion is considered in the general form given by FHP($, e) Recall that a block norm is 
a norm such that its unit ball is a polytope symmetric with respect to the origin and with non empty interior. 
Block norms, also referred to as polyhedral norms, play an important role in the measurement of distances in 
many areas of Operations Research and Applied Mathematics as for instance in Location analysis or Logistics. 
They are often used to model real world situations (like measuring highway distances) more accurately than 
the standard Euclidean norm. In addition, they can also be used to approximate arbitrary norms since the set 
of block norms is dense in the set of all norms S3- 
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Figure 2. Estimated models and discarded points in LTS models. 


We denote by || • \\b the norm in R d whose unit ball is given by a symmetric with respect to the origin, with 
non empty interior polytope B , i.e. B = {a; £ R d : ||a||_B < !}■ Let Ext(R) = {b g : g = 1,..., G} be the set of 
extreme points of B and B° the polar set of B which is defined as: 

B° = {v £ R d : v% <l,g=l,...,G} 

and Ext(-B°) = {6°,..., b^o}. 

The following result characterizes the expression of a block-norm distance in terms of the extreme points of 
the polar set of the polytope B. 


Lemma 16 (Ward and Wendell [46l [47l ). Let B be a polytope in R d and x £ then: 

Hails = max{|a*6°| : g = 1,..., G 0 }. 


Special cases of block norms are the Manhattan (if) and the Chebyshev {ioo) norms for adequate choices of 
the extreme points of the unit balls. For instance in R 2 , such distances are characterized by the following set 
of extreme points of their unit balls, {±(1,0), ±(0,1)} and {±(1,1), ±(1, —1)}, respectively. Any block norm 
|| ■ ||s in R d induces a distance between vectors x,y £ R d given by D s{x,y) = ||a — p\\b- 

Given a set of points {aq,... ,x n } C R d and a polyhedral unit ball B, our goal is to obtain the hyperplane 
"H(/3) = {(/ £ l d : (l,y t )f3 = 0} such that the overall distance Ds(-, •) from the sample to H((3) is minimized 
according to the globalizing criterion $ (for 1 < p = j £ Q). That is: 


(RM B ) 


min 

iSeR 1 ^ 1 


n 



i= 1 


where for any x £ R d , e x = Db(x,H((3)) = min,^^) D b(x,z), is the “|| • Hs-projection” of x onto the hyper¬ 
plane H(/3), and ep) denotes the element in {e Xl i ■ ■ ■, £x n } which is sorted in the i-th position (in nondecreasing 
order). 

We recall that according to equation 0 in Lemma [T| for any polytope B symmetric with respect to the 

\(3 t x\ 

origin and with non empty interior, and "H(/3) = {y* S R d : (1 ,y 4 )^ = 0} then Ds(a_o, 'H(P)) = -rr ———, 

IIP-oIIb 0 

where B° is the polar set of B and a* = (1, Xi,..., Xd) £ R d+1 is a given point. 


Lemma 17. Let (3* £ R d+1 be an optimal solution of |RMs| with f3*L n ^ 0. Then, f3' = ---— is also 

_ ^IIp-oIIb 0 

an optimal solution o/ |RMg| with H/S^qH^o = 1. Thus, there is an optimal solution o/ |RMgl (3, that verifies 
D s (x_o,'H(/3)) = \f3 t x\ for any x 4 = (l,ai,... ,x d ) £ R d+1 . 
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From the above lemma, we have 

Theorem 18. Let {rci,..., x n } C R d+1 be a set of points and let B C be a polytope with Ext(B) = 
{b ±,..., bo}- Then. |RMb| is equivalent to the following disjunctive programming problem 


n 


(LRP^b) 

P\B) 

■= min Ajfl, 

3 = 1 


s.t. 

0-au 

(16) 


Si > (3 t Xi, Vi = 1, . 

(17) 


£i > —f3 t Xi,'ii = 1 

(18) 


(3 f - 0 b g < 1, Vp = 1 

G 

(19) 


\/ (3- 0 bg = 1, 


3=1 


P G R d+1 ,z,9,e G M". 
Wij G {0, = 1 


Proof. Let us denote by e* = T>B{xi,'H((3)). By Lemma [H e,; = ——d—_ Furthermore, by Lemma ITT! we 

IIp-oIIb° 

can assume that ||/3_ 0 || B o = 1, hence e* = 1(3*Xi\ (constraints (ITtH) and (fI71) L By Lemma flbl ||/3_ 0 IIs° = 
d 

max{| Pibgi\ : g = 1,..., G} since (B°)° = B. Hence, there exists go G {1,..., G} such that \\/3_o\\b° = 1 

i=l 

d d 

(disjunctive constraint (fTUlL and thus ^^/3kb g k < ^ Pkb go k = 1 (constraint (|T%|) h (Note that absolute values 

k =1 k =1 

do not need to be taken explicitly into account since if b g G Ext(S), then — b g G Ext(B).) □ 


The above problem can be equivalently written as an unique mixed integer second order cone programming 
problem once constraints (J5j) are transformed using the result in Lemma Q and binary variables are added to 
decide which go is chosen to verify constraint m■ By the same token, this problem can be also equivalently 
rewritten as G different SOCP programming problems (each of them fixed to verify one of the disjunctive 
constraints). Furthermore, MINLP disjunctive programming techniques (e.g. [U, [22] 1 may be used to solve the 
corresponding problem. The following result states a MINLP formulation for |RM b\ 


Corollary 19. Let {x \,..., x n } C R d+1 be a set of points and let B C 
{&i,..., bo}- Then, LRP^s is equivalent to the following problem: 


be a polytope with Ext (B) = 


n 


(LRP^b) 

P*(B) 

:= min^X jOj 

q - 1 


S.t. 

J - - 1 - 

o - jni) 

(20) 


Si > Pixi.Vi = 1,... ,n, h = 1,..., G, 

(21) 


Si > -Pixi.Vi = 1,... ,n, h = 1,..., G, 

(22) 


Ptohbg < 1, Vg = 1,..., G, h = 1,..., G, 

(23) 


P'-o h bh = £h,h= 1,...,G, 

G 

(24) 


h =i 

z,6,e G M n , 

P h £R d+1 ,,f h £{0,l},Vh = l,...,G, 
Wij G {0, l},Vi,j = 1 ,...,n. 


Some special cases for the globalizing criterion $ allow even simpler formulations reducing considerably the 
computational complexity of the problems. In particular, when A; = 1 for all i = 1,. .., n, the integer variables 
representing ordering {w l; j ) can be removed from the above formulation. 

The following result will allow us to consider polyhedral norms which are dilations of other polyhedral norms, 
i.e., polyhedral norms || • for some bounded polyhedron B and /i > 0 (pB = {p z : z G B}). 


Corollary 20. Let B be a polytope and p > 0. Then, if f3* is an optimal solution for |LRP<I),b for B = 
(3 = j^/3* is an optimal solution for LRP^s when B = pB. Moreover, p*(pB) = j-^p*(B). 


= B , 
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Proof. It is sufficient to observe that for any (3 G 

||(/3i,...,/3 d )|| At5 o =max{|/x6*/3‘| :g=l,...G} 


a 

it M 


= g ma.x{\b (3 | ■ g = 1,...G} = g\\(Pi ,.. ■, Pd)\\-, 


Since ^^(ei, ■ • ■, £ n ) = ^y<Pg-(£i,..., e n ), we get the relation between the optimal values. Let (3* be 

1 


an optimal solution of LRP$ b Then, — (3 is clearly a feasible solution to LRP$ b when B = gB since 

-^ g - 

ll(f ft, ■ ■ ■, -fflh b° = ll(ft> ■ ■ -.03)113° = 1- □ 

g g 


For the sake of c omputing GCoD, for solutions to problems with block-norm residuals, note that the one 


dimensional problem 
constant K e when the 

B. 


LRP 

residua 


does depend on $ and also on the residuals through K e . Let us denote by kb the 
s e x are defined as the block-norm projection with unit ball given by the polytope 


Corollary 21. Let B C M. d be a polytope. The Goodness of fitting index, GCoD, when the residuals are defined 
as the block-norm, distance with unit ball B, can be computed as: 

$>* 


GCoD$ e = 1 - 


max JM, 


s=i ,-,G 


E I x id - ((A,p) - omp(a;. d ))p 

with residuals measured with the polyhedral norm 


LRP^ e 


where ( X,p)-omp(x.d ) is the solution to the problem 
with unit ball B. 

Proof. By Lemma [9] the goodness of fitting index GCoD$ e can be computed as: 

$* 


(25) 

where kb = 


GCoD$ £ = 1- 


min^gR <&(n B \xid — /3 0 1, - - -, «b| x n d - /3o|) ’ 


max Zd ' 

zGB 


Observe that since B is a polytope then the above maximum is attained in an extreme point of B , namely 
£»i,..., mid thus kb = 


Next, the problem 


max b gd 
g=l,...,G 


LRP 




in this case can be expressed as: 

n 

k b min V' A i\x. d - /3o|f,v 


Recall that this is a (A ,p) Ordered median problem and that its optimal solution, a (A,p)-omp, can be easily 
obtained by the result in Theorem [III Replacing the optimal solution to this problem in (1^51) it results in: 

<j>* 

max \bgd\- 


GCoD$, e = 1 - 


Y I x id - (( A >P) “ omp(a:. d ))| r 


5=1, ...,G 


i=1 


□ 


Note that for A = (1,..., 1) the (A, l)-omp is the standard median point and thus the expression E | Xid 

2=1 

median^.d)! is what it is usually called the mean absolute deviation with respect to the median. It is a well- 
known criterion to find robust optimal hyperplanes of the mean value and a direct measure of the scale of a 
random variable about its median with many applications in different fields (see ESI). 

We illustrate the behavior of the block-norm residuals fitting hyperplanes with the same data set used in the 
Section [3] 

Example 22. We consider again the stars data used in Examvle \15\ In this case, we run our implementation in 
R for I\-norm , I^-norm and hexagonal norm (as the one used in [32j with Ext (B) = {±(2, 0), ±(2,2), ±(—1, 2)}) 
residuals. We use three different criteria: overall SUM (X = (1,..., 1) and p = 1), MAXimum (X = (1,0,..., 0) 

K n-K 

and p = 1), K-centrum (X = (0,..., 0,1,..., 1)) for K = [0.75nJ (the model will minimize the sum of the 25% 


K 


n-K 


greatest residuals) and anti-K-centrum (X = (1,..., 1,0,..., 0)j for K = [0.5nJ (the model will minimize the 
sum of the 50% smallest residuals). The results for all the combinations and the graph for the K-centrum lines 
are shown in Figured 












14 


VICTOR BLANCO, JUSTO PUERTO, AND ROMAN SALMERON 


Method (<f>, e) 

Optimal Line 

GCoD$ )£ 

(SUM, h) 

y = 7®-25.81 

0.6505853 

(SUM, l x ) 

y = 5.25a; H—18.1425 

0.7009688 

(SUM, Hex) 

y — 7x — 25.81 

0.6505853 

(MAX, l x ) 

y = -3.230769a: + 18.77577 

0.5336373 

(MAX, 4J 

y = -3.230769a: + 18.77577 

0.6438685 

(MAX, Hex) 

y = -3.230769a: + 18.77577 

0.6438685 

(kC, l x ) 

y = -4.307692a: + 23.03346 

0.4628481 

(kC, l^,) 

y = -2.493333a: + 15.67113 

0.5921635 

(kC, Hex) 

y = 7.642857a; + -28.67929 

0.8317972 

(AkC, lx) 

y = 5.6a: - 19.804 

0.8443055 

(AkC, l x ) 

y = 4.869565a: — 16.41565 

0.8426523 

(AkC, Hex) 

y = 5.473684a: - 19.28316 

0.6431602 



Figure 3. Optimal lines obtained with block-norm residuals for the stars data set. 


Note that different situations may happen when running the different models: in the case of the SUM criterion 
the models for £\ and hexagonal residuals coincide; in the MAX criterion the three optimal lines are the same, 
and for the K-centrum and anti-K-centrum the three models are different. Furthermore, even in the case when 
the models coincide, one may have different goodness of fitting indices due to the different way of measuring 
distances (see the £\ and hexagonal residuals for the MAX criterion). 

From the above, we observed that the GCoD are not comparable when different residuals are used in the models 
since the value given to the residuals (both with respect to the best model and with respect to the simplified model 
with only intercept) is different. Thus, the generalized coefficient allows us to compare the goodness of fitting 
between models provided that the distance (to measure the residuals) and the aggregation criterion are fixed. 


5. Fitting Hyperplanes with £ t distances 

In this section we present the mathematical programming formulations for computing the optimal hyperplanes 
when the residuals are defined as £ T distances between demand points and the linear body. Recall that the 
t? T -norm in R 6 *, with r > 1, is defined as: 


(£“')' 
max {>fc|} 


ifr < oo, 
if t = oo 


for any z = (zi,..., zf)* G R d - From this norm we denote by (z, y) = \\z — y || T the £ T -distance between the 
points z,y G R d . The well-known Euclidean distance that measures the straight line distance between points in 
R d is the ^ 2 -norm in this family. Note that the extreme cases of i\ and £ aQ represent both block and £ r -norms, 
since their unit balls are polytopes but also fit within the family of £ T -norms. 

We recall that according to equation © in Lemma [TJ for any r = - G Q with r > s G Z+, gcd(r, s) = 1 

and TL{)3) = { y 1 G M d : (l,y t )/3 = 0} then D T (z, 'H(fi)) = .. "j where v is such that — + — = 1 (for r = 1, 

\\P-o\U T v 

v = oo while for r = oo, v = 1). 

In this section we will assume that the residuals are defined as the shortest distance from the points to 
the fitted hyperplane, namely to their projections, under a given i T norm. In other words, for a given point 
x = (1, Xi,.. ., x d y the residual is: £a(/3) = D r (i_ 0 , H(/3)). 

As in previous sections, for a given set of points {x\,... ,x n } C R. d+1 , the computation of the parameters 
f3 G R d+1 assuming that the globalizing criterion is $ and the residuals are measured with t? r -distance can be 
obtained by solving an adequate optimization problem. 


Theorem 23. Let {a;i,..., x 

and || • || T a £ T -norm in R d . 
problem: 


„} C R d+1 be a set of points, X € R n , t = - £ Q with r > s € N and gcd(r, s ) = l, 

s 

The Problem FHP($,£) is equivalent to the following mathematical programming 
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(LPR$ £ t ) minA jOj 

3 =1 

s.t. 0 - ciD,©- nm 

(26) ||/3_ 0 |k = 1, 

Wij £ {0,1}, Vi = 1,..., n, 

z,8e R”,/3eK d+1 . 

Note that the above problem is nonconvex for 1 < r < oo because of the binary variables and constraint 
(03). Approximation schemes are available in different free and commercial solvers, although no guarantee of 
optimality is provided (e.g., NLOPT, MATLAB, Minotaur, ...). In what follows we describe an approximation 
approach based on some linear approximations of the problem. 

Let P be a polyhedron such that P C B = {z £ R d : ||z||„ < 1}, and denote by rp = sup || 2 || p=1 || 2 ;||^ (note 
that by construction rp < 1). Observe that rp is the radius of the smallest t^-ball containing P. In addition, let 
Q be a polyhedron such that B C Q , and denote by Rq = inf || 2 || Q=1 \\z\\ v (note that by construction Rq > 1). 
In this case Rq is the radius of the largest K ,,-ball contained in Q. 


Theorem 24. Let Ai, 

(27) 

(28) 


• ■, A n > 0 and the globalizing function $(£ 1 ,..., e„) = ^i £ u) then: 

i=1 

r < 

r P 

h q 


Proof. By the relations between the norms, it is clear that ||z||p > ||z||„ > rp||z||p. Let "H(/ 3) = {z £ K d : 
(l,z*)/3 = 0}. Then, for any x £ R d , the above relationships imply the following inequalities relating the 
distances with respect to || • ||p*-residuals and || ■ || T -residuals: 


and 


Dp* (x-o,T~L(f3)) 


1/3*-'cl < 1/3* si 
II/3 -oIIp “ 11/3—0||t* 


< D r (x_ 0 ,H(/3)) 


D r {x-o,nm = < l^ 1 < ±-Dp.(x- 0 ,H(J3)) 

ll/3—o lit* rp\\f3_ 0 \\p r P 

n 

Let us consider the globalizing criterion $(ei,. .., e„) = ^ A i £ fiy Then, the evaluation of $ with respect 

i=l 

to the residuals computed with the polyhedral norm with unit ball P* and the /V-norm, namely e^p* = 
Dp* (xi’-o, 3/(/3)) and £i/ r = D r (cc^-o, 7/(/3)) for all i = 1 ,..., n, satisfies: 

$(e P .) < 4>(e^) < ^-$(ep*). 

r p 

This equation proves 423 . 

Next, it is clear that ||c||q < \\z\\ v < I?q||z||q. Now, using an argument similar to the one above we conclude 
that 


D Q ,(x-o,n(f3)) = 


1/3* 


> 


1/3* 


II/3-oIIq - 11/3—0||t 

\f3 t x\ |/3* x 


> D r (x-o,H(/3)) 


11/3- 


0 II ^ 


> Ali/Uoii* a 


From these inequalities it clearly follows 


□ 


Let Pn be a symmetric with respect to the origin polytope with N vertices, {p±,... ,pn}, inscribed in the 
i v hypersphere B = {z £ R d : ||;z||„ = 1} and let rp N be the radius of the smallest ball centered at the origin 
containing Pm- Let Rq n = yy— and denote by Qn the Rq n -dilation of Pn- By construction Pn C B C Qn- 

n 

Hence, for the globalizing function <J>(ei,..., e n ) = ^ A,eL, by the Theorem l24l we get that: 

i=l 

max{$(e P .), -y—$(e Q ^))} < $(e* T ) < min{4>(£ Q ^), — $(e P * )} 
k Qn r P N 
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Furthermore, by Corollary [2D1 since Qn is a dilation of P/v, both problems have the same optimal solutions 
and $(ep*) = r s P ${e Q * N ). Hence, 

-r $ (£pj) < $00 < $0 qO 

r PN 

It is clear from its definition that rp N gives the approximation error whenever a £ ,,-norm is replaced by a 
polyhedral norm with unit ball Pjv- This measure can be explicitly computed from the set of inequalities that 
describe the polyhedron. 

Lemma 25. Let P = {z G M. d : aiX < bi,i = 1,..., N} be a polytope, then: 


rp = max 


ad 


Proof. First, note that rp = sup || z || p=1 ||z||„ = max ||z||„ by the compactness of P. Thus, rp is the £„-inradius 

||z||p=l 

of P. Next, by [H], the radius of a £„ ball centered at the origin and reaching the facet {x G : a\x < b} of P 

|6d 

is the l v projection of the origin onto that facet, namely 


among the N facets defining P. 
Theorem 26. Let {xi ,..., x n } C 


ad 


. Hence, rp is the maximum of those distances 

□ 


1 be a set of demand points, A G 


l+, t = — G 
s 


with r > s G N, 


gcd( 7 ’, s) = 1 and the globalizing function 4>( £ i,..., s n ) = A,e^. The following problem provides a lower 


bound for Problem LPR$.£ t 


»=l 


n 


(Inner-l? r ) 

p* := min ^ X 3 0, 



3= i 



s.t. ©-uni) 


(29) 

O) 

IV 

AS 

sT 

Vi = 

(30) 

1 /3—oILpn — 1 ) 

Wij G {0,1}, 
z,«Gt{,j3G R d+1 

Vi = 


Furthermore, p* < < ~p-p*. 


Corollary 27. For any data set {xi ,..., x n } C R rf+1 and any £ T -norm with 1 < r < +oo there exists a 
polyhedral norm || • ||s whose unit ball B has at most 2n extreme points and such that the optimal values of 


Problem LPR$ j t and LRP$ p coincide. 


In [23] the authors propose a measure of the goodness of approximating a given norm by another norm. This 
measure was defined in order to quantify the approximation errors when modeling road distances between cities. 
We redefine this measure to evaluate the approximation errors when approximating £ T norms via polyhedral 
norms: 

(D T (xj,/3) - Dp(xi, (3)) 2 


SD = 


E 

i =1 

D T (xi,{3)>0 


D r{Xi,P) 


Example 28. Let us consider again the stars data from Examvle \15\ We run now the models using as aggrega¬ 
tion criteria the overall sum of the residuals (<& = SUM) and the residuals are the i T projections of the points 
onto the optimal line, for r G {1.5, 2, 3}. The obtained estimations for the aggregation criterion $ = SUM and 
their goodness of fitting (CCoD$ jE/ ) are shown in Table[J\ The obtained lines are drawn in Figure [3] 

Observe that for this data set, getting high accuracy for the £ T -norm residual problems is possible using small 
number of vertices (N) in the approximation by polyhedral norms. As expected, increasing the number of vertices 
improves the accuracy at the price of increasing the computation times. 

We also computed the optimal lines for different aggregation criteria ('<!> G {SUM, MAX, kC, AkC}) with 
£ T residuals, r G (1.5,2,3}, using the polyhedral approximation approach with N = 480 vertices. The results 
are shown in Table H The reader may observe from these results that the approximation error, although tiny, 
depends both of the chosen residuals and aggregation criteria. 

Finally, we compare our approximation scheme for l T residuals, on this data set, with other available im¬ 
plementations. Orthogonal Distance Regression (ODR) is a particular case of our general framework where 
£2 residuals are chosen and <E> is the sum of squares aggregation criterion (note that both approaches coincide 
when the coefficient of the dependent coordinate is non zero while such an assumption is not imposed in our 
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Table 1 . Estimated models with minisum criterion in Example 1151 


r 

N 

$ 


GCoD 

Rp 

r P 

Time 

SD 


16 

(36.81, -1, 0.14) 

11.1851 

0.6505 

0.9848 

1.015 

1.0 

7.26 x 10" 5 

1.5 

80 

(36.84, -0-99, 0.14) 

11.1324 

0.6508263 

0.9993 

1.0006 

1.91 

6.06 x 10" 6 


320 

(36.83, -0.99, 0.14) 

11.1111 

0.6509203 

0.9999 

1.0000 

14.16 

9.41 x 10" 9 


16 

(36.81, -1, 0.14) 

11.1851 

0.6505 

0.9801 

1.0195 

1.04 

7.87 x lO" 3 

2 

80 

(36.19, -0.98, 0.14) 

16.3103 

0.654216 

0.9922 

1.0001 

2.01 

1.91 x 10" 7 


320 

(36.19, -0.98, 0.14) 

16.3100 

0.654211 

0.9999 

1.0000 

16.53 

1.64 x 10" 7 


16 

(34.35, -0.96, 0.16) 

14.1283 

0.6611 

0.9801 

1.0202 

1.01 

4.56 x lO" 3 

3 

80 

(34.09, -0.95, 0.16) 

14.1621 

0.66421 

0.9992 

1.0001 

2.04 

3.50 x 10" 6 


320 

(34.08, -0.95, 0.16) 

14.1468 

0.6643 

0.9999 

1.0000 

11.48 

4.68 x lO” 10 


Figure 4. Estimated lines for the data in Example [15] approximating by a {16,80, 320}-gon. 


- iix = h(N = 16) 

. l*(N = 16) 



-fi.s(JV = 80) 

- t a (N = 80) 

- £ 3 (N = 80) 


- e i.5 (N = 320) 

- e a (N = 320) 

. l 3 (N = 320) 




3.9 4.1 4.3 4. 

4.7 4.9 3.5 3.7 3.9 

^1.5 

4.1 4.3 4.5 4.7 4.9 

A 

3.5 3.7 3.9 4.1 4.3 4. 

G 

SUM 

Line 

GCoD 

SD 

y = 5.92® - 21.1016 
0.6643 

3.36 X 10~ 10 

y = 6.75® - 24.6975 
0.6542 

1.73 x 10 1 ° 

y = 7® - 25.81 
0.6509 

1.65 X 10” 9 

MAX 

Model 

GCoD 

SD 

y = -3.2307®+ 18.7757 
0.5805 

4.07 X 10" 14 

y = -3.2307® + 18.7757 
0.5544 

1.90 x 10 -12 

y = -3.2307® + 18.7757 
0.5381 

3.85 X 10 -13 

kC 

Model 

GCoD 

SD 

y = -2.8133® + 16.9367 
0.5111 

3.51 x 10 -13 

y = -3.1756® + 18.5100 
0.4790 

7.53 x 10 1 ° 

y = -4.3076® + 23.0334 
0.4650 

9.70 X 10 1 ° 

AkC 

Model 

GCoD 

SD 

y = 6.75® - 25.0875 
0.8092 

7.15 X 10 1 ° 

y = 6.5555® - 24.1533 
0.82512 

2.10 x 10~ 9 

y = 5.175® - 17.7146 
0.8217 

5.49 x 10 1 ° 


Table 2. Optimal lines for different criteria and l T residuals of Example [28] 


models). The package pracma in R allows to compute ODR by using an approximated iterative procedure (see 
m)- The models obtained with both approaches are shown in the following table, were one can observe that, 
for this data set, our approach to approximate £ T distances by polyhedral norms (with N = 320 vertices) has a 
better performance on the global error measure of the models (although as expected the models obtained by both 
methods are almost the same): 



ODR 

SOS-i -2 (SD=9.93 x io- n ; 

Model 

y = 

-7.057362 + 35.42935 

y = -7.0980622 + 35.60477 

Global Residuals 


3.959383 

3.662783 


6. Experiments 

We tested the proposed models for different data sets in order to show the applicability and the differences 
of some of the methods detailed in the sections above. Our formulations have been coded in Gurobi 6.0 under R 
and executed in a PC with an Intel Core i7 processor at 2x 2.40 GHz and 4 GB of RAM. As far as we know, the 
battery of experiments that we performed has never been considered in the literature, since we have compared 
42 different methods (several combinations of aggregation criteria and residuals measures). 

6.1. Synthetic Experiments. We consider a set of randomly generated points with different peculiarities in 
order to test and compare the described methodologies, following similar schemes that those proposed in [7j. 
We generated n = 100 data points in dimension d € {2,4}, {x \,..., x n } C R d+1 as follows. Each Xtk follows an 
independent and identically distributed Gaussian distribution with mean 0 and standard deviation 100. We fix 
/3* = (0,1,..., 1) € M d+1 . The last coordinate, xd, is chosen as the response and we generate it as: 
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Aggregation criteria 

SUM 

n 

2=1 

MAX 

max Si 

2 =l,...,n 

MED 

median(ei,..., e„) 

kC 

|0.5nJ 

£ {i) 

2=1 

AkC 

n 

E £ « 

i=[0.5nj+l 

SOS 

n 

2=1 

1.5SUM 

n 

X>? 

2=1 


Residuals 


V 


Table 3. Combinations of chosen aggregation criteria and residuals. 


d -1 

x id = - ^2 Xik + Ui ' Vz = 1, . . . , 71, 
k—1 

where iq is also generated as a Gaussian distribution with mean 0 and standard deviation 10. 

Then, 15% of the data are now corrupted by adding an extra Gaussian term (with mean 0 and standard 
deviation 500) to: (1) all the components except the last one or (2) to the last coordinate. 

For each one of the generated data sets, we run the models that results from the combination of the following 
aggregation criteria and residuals detailed in Table 0 

Tables []][7] report, for each battery of generated data, the following information: i) the coefficients of the 
optimal hyperplane (/3), ii) the goodness of fitting index GCoD, iii) the percentage of the sample data which 
are contained in a strip delimited by two parallel hyperplanes to y = /3x with (orthogonal) distance e = 10 (%), 
and iv) the width of the strip that is necessary to include 90% of the data (ego). 

We conclude, from the experiments for the bivariate case, that in general a better performance is observed 
in all the methods when the corrupted coordinate is the dependent one (Y), as compared with introducing the 
corruption on the independent coordinate (A''). In particular, the SUM, the 1.5SUM and the kC criteria (for 
vertical distance residuals) get better fitting models in the Y-corrupted case. Although slightly better, almost 
similar results were obtained for the AkC, MEDIAN and kC (for £ T residuals) due to the robustness of those 
criteria. Also, we observe that for the A-corrupted case, the linear residuals (V, l\ and t^) models coincide 
for all the criteria except the AkC. This is not the case in the Y-corrupted experiments, where equal or similar 
models were obtained for all the ^ r -residuals. Observe that although in the A-corrupted case the larger % 
seems to imply a greater GCoD, that is not the case in the Y-corrupted experiments where one can find many 
combinations of criteria-residuals where that behavior does not happen. 

Similar conclusions can be derived from the multivariate case {d = 4), except that in this case there are no 
coincidences between the models obtained with different combinations of criteria and residuals. Furthermore, 
the convenience of using measures for the goodness of fitting which are not criterion/residual dependent is 
confirmed. 

6.2. Data: Durbin-Watson. We also performed some experiments over the classical real data sample used 
in [16]. The data aims to analyze the annual consumption of spirits from 1870 to 1938 {n = 69) from the 
incomes and the relative price of spirits (deflated by a cost-of-living index). Hence, the variables observed 
in this data sets are the logarithms (the coefficients are then interpreted in terms of percent change) of the 
following measures: Ai (Real income per head), A 2 (Relative price of spirits) and A 3 (Consumption of spirits 
per head). 

For illustrative purposes, we analyze both the global model with the three variables (d = 3) and the bivariate 
model considering Ai and A 3 and obviating X 2 (d = 2). 

6.2.1. Bivariate model. First, for the case d = 2, we run the 42 models (Tabled over the data set where X\ 
(income) and A 3 (consumption) are measured. The obtained hyperplanes are detailed in Table [ 8 ] and the fitted 
lines drawn in Figure [5] Note that the methods that use vertical distance residuals were not able to capture 
the actual behavior of the consumption with respect to the incomes. Furthermore, the MAX criterion seems to 
fail for any choice of residuals, since it tries to explain the unique outlier point that exists in the data set. The 
rest of the hyperplanes, with minimal deviations, have a similar behavior. In order to analyze the differences 
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Table 4. Results for bidimensional experiments corrupting the A' variables. 



V 


l oo 

SUM 

p 

GCoD 

% 

£90 

(-1.9587,0.3011, 1) 
0.1456 

8% 

141.2995 

(1.9587, -0.3011, -1) 
0.1456 

8% 

141.2995 

(0.4240, -0.9403, -1) 
0.5342 

65% 

87.0871 

MAX 

P 

GCoD 

% 

£90 

(10.9038, 0.1571,1) 
0.1484 

10% 

158.9295 

(10.9038,0.1571, 1) 
0.1484 

10% 

158.9295 

(10.9038,0.1571, 1) 
0.2641 

10% 

158.9295 

SOS 

P 

GCoD 

% 

£90 

(-3.1753,0.1860, 1) 
0.2261 

8% 

157.7177 

(3.1753, -0.1860, -1) 
0.2261 

8% 

157.7177 

(-1.8549,0.2858, 1) 
0.4925 

9% 

143.1279 

1.5SUM 

P 

GCoD 

% 

£90 

(-3.5386,0.2112, 1) 
0.1812 

8% 

152.361 

(3.5397, -0.2112, -1) 
0.1812 

8% 

152.3626 

(0.3967, -0.4136, -1) 
0.4499 

8% 

127.4389 

kC 

P 

GCoD 

% 

£90 

(-3.0188,0.2328, 1) 
0.1226 

8% 

150.5599 

(-3.0188,0.2328,1) 
0.1226 

8% 

150.5599 

(0.3503,0.9091, 1) 
0.4275 

60% 

85.1974 

AkC 

P 

GCoD 

% 

£90 

(5.8180,0.7718, 1) 
0.6735 

29% 

77.4723 

(2.2956,0.7734,1) 

0.9040 

34% 

74.8420 

(2.6795,0.9874, 1) 
0.9758 

70% 

92.8187 

MED 

P 

GCoD 

% 

£90 

(6.1846,0.7795, 1) 
0.7021 

31% 

78.4775 

(6.1842,0.7795,1) 

0.8690 

31% 

78.4772 

(1.3314,0.9890, 1) 
0.9741 

70% 

91.9773 



^1.5 

r 2 

r 3 

SUM 

p 

GCoD 

% 

^90 

(-0.2603, -0.9299, -1) 
0.4133 

62% 

86.7791 

(-0.2603, -0.9299, -1) 
0.3417 

62% 

86.7791 

(-0.2603, -0.9299, -1) 
0.2615 

62% 

86.7791 

MAX 

P 

GCoD 

% 

£90 

(-10.9038, -0.1571, -1) 
0.1821 

10% 

158.9295 

(-10.9038, -0.1571, -1) 
0.1588 

10% 

158.9295 

(10.9038,0.1571, 1) 
0.1495 

10% 

158.9295 

SOS 

P 

GCoD 

% 

£90 

(2.4728, -0.2391, -1) 
0.3163 

8 % 

149.8204 

(-2.8551,0.2102,1) 
0.2552 

8 % 

151.9362 

(-3.1181,0.1903, 1) 
0.2295 

8 % 

156.6873 

1.5SUM 

P 

GCoD 

% 

£90 

(3.4138, -0.2225, -1) 
0.1853 

8 % 

149.6913 

(3.0670, -0.2704, -1) 
0.2145 

9% 

145.969 

(1.4864, -0.3260, -1) 
0.2799 

7% 

135.7776 

kC 

P 

GCoD 

% 

£90 

(-2.6422, 0.2474, 1) 
0.1263 

9% 

147.9623 

(-0.2632, -0.9011, -1) 
0.1913 

57% 

84.4867 

(-0.3503, -0.9091, -1) 
0.2791 

60% 

85.1974 

AkC 

P 

GCoD 

% 

£90 

(-0.0741,0.9357, 1) 
0.9468 

64% 

86.9840 

(2.2028, 1.0126, 1) 
0.9576 

70% 

94.2569 

(-0.9506,0.9930, 1) 
0.9645 

65% 

91.5147 

MED 

P 

GCoD 

% 

£90 

(1.5779, -0.9545, -1) 
0.9530 

63% 

88.5178 

(2.9207,1.0139,1) 
0.9611 

69% 

94.8548 

(0.2899,0.9792, 1) 
0.9655 

65% 

90.5271 


between these models we also report in Table [9] the marginal variations of each one of the models (according to 
Lemma |T]). 

Observe that, when the i\ residuals are considered, all except the MAX criterion provide a 0 marginal 
variation. This pattern can be explained as a result of Lemma [2] and the fact that the £i-norm unit ball in R 2 

( 1 if P 2 = max{|/?i|, |/3 2 |}, 

has extreme points {±(0,1), ±(1, 0)}. Hence k(/?) = < —1 if fa = — max{ | /3i | , | ^2 1 } , ■ Thus, the marginal 

[ 0 otherwise. 

variation of X\ with respect to X 3 is zero iff |/3i| = max{|/3i|, |/321}, being then |/?21 < |/3i|- It means that the 
absolute value of the slope of the line is greater than 1, being the decreasing (or increasing) of the response 
consumption in terms of the incomes more than a 100%. 

In order to validate and analyze the stability of the computed hyperplanes we perform a fc-folcl cross validation 
scheme |43] to the data set. Such a method consists of randomly partitioning the sample into k folds of similar 
size, Si, ..., Sk ■ For each j e {1 ,..., fc}, each optimal hyperplane is computed using the points in (J 7 ; , Si and 
Sj is used to validate the results. In our case, we partitioned the data into k = 7 folds, each of them with 
10 data, except one with 9 points. In Table [TO] we summarize the results obtained with this experiment. We 
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Table 5. Results for bidimensional experiments corrupting the Y variables. 



V 


£oo 

SUM 

p 

GCoD 

% 

ego 

(-0.4324, -1.0070, -1) 
0.5226 

72% 

158.3495 

(-2.7476, -1.1156, -1) 
0.5464 

57% 

144.4862 

(-0.8817, -1.0333, -1) 
0.7637 

73% 

154.9621 

MAX 

P 

GCoD 

% 

£90 

(164.40, 1.95, -1) 
0.0109 

5% 

266.337 

(-131.52, -7.30, -1) 
0.7575 

6% 

144.6019 

(-131.52, -7.30, -1) 
0.7867 

6% 

144.6019 

SOS 

P 

GCoD 

% 

£90 

(-19.4780,0.9765,1) 

0.2459 

24% 

176.2108 

(24.3778, -3.9704, -1) 
0.8055 

12% 

119.0515 

(-21.8989,2.4558,1) 

0.8896 

14% 

108.3728 

1.5SUM 

P 

GCoD 

% 

ego 

(2.2257, -0.9993, -1) 
0.3894 

72% 

161.1331 

(8.1241, -2.8635, -1) 
0.6583 

15% 

114.1084 

(4.2013, -1.5531, -1) 
0.8111 

24% 

107.9904 

kC 

P 

GCoD 

% 

ego 

(-0.6995, -0.9989, -1) 
0.4422 

71% 

159.1129 

(4.8095, -1.6540, -1) 
0.4969 

23% 

100.6695 

(-1.0107, -1.0744, -1) 
0.7265 

67% 

150.2014 

AkC 

P 

GCoD 

% 

ego 

(10.0084, -0.9838, -1) 
0.7526 

53% 

168.5344 

(-1.3062, -1.0398, -1) 
0.9914 

70% 

153.9189 

(-1.2815, -0.9942, -1) 
0.9961 

72% 

159.2534 

MED 

P 

GCoD 

% 

ego 

(8.6545, -0.9641, -1) 
0.8478 

57% 

170.0131 

(-0.8028, -1.0379, -1) 
0.9894 

73% 

154.4849 

(-4.3252, -1.0113, -1) 
0.9947 

69% 

155.1026 



^ 1.5 

r 2 

r 3 

SUM 

p 

GCoD 
% 
eg o 

(-0.9890, -1.0403, -1) 
0.6250 

70% 

154.0857 

(-0.9890, -1.0403, -1) 
0.6658 

70% 

154.0857 

(-0.9890, -1.0403, -1) 
0.7023 

70% 

154.0857 

MAX 

P 

GCoD 
% 
eg o 

(-131.52, -7.30, -1) 
0.7577 

6% 

144.6019 

(-131.52, -7.30, -1) 
0.7598 

6% 

144.6019 

(-131.52, -7.30, -1) 
0.7654 

6% 

144.6019 

SOS 

P 

GCoD 
% 
eg o 

(24.0474, -3.7686, -1) 
0.8077 

13% 

118.4519 

(23.2040, -3.2532, -1) 
0.8195 

13% 

119.827 

(22.5246, -2.8381, -1) 
0.8412 

13% 

115.0321 

1.5SUM 

P 

GCoD 
% 
eg o 

(8.2797, -2.4830, -1) 
0.6667 

14% 

114.0191 

(5.8395, -1.9194, -1) 
0.6976 

19% 

102.4955 

(4.7010, -1.6953, -1) 
0.7384 

23% 

97.65193 

kC 

P 

GCoD 

% 

ego 

(-1.0107, -1.0744, -1) 
0.5665 

67% 

150.2014 

(-1.0107, -1.0744, -1) 
0.6135 

67% 

150.2014 

(-0.8903, -1.0744, -1) 
0.6556 

66% 

150.2834 

AkC 

P 

GCoD 
% 
eg o 

(-2.6754, -1.0658, -1) 
0.9901 

69% 

150.0206 

(-2.7011, -0.9640, -1) 
0.9910 

68% 

161.8515 

(-3.9149, -1.0070, -1) 
0.9915 

69% 

155.8964 

MED 

P 

GCoD 

% 

ego 

(-0.8019, -1.0319, -1) 
0.9911 

74% 

155.184 

(-2.6799, -1.0009, -1) 
0.9924 

70% 

157.4707 

(-1.5141, -1.0345, -1) 
0.9928 

70% 

154.3846 


report: the maximum, minimum, median and mean width of the strips that are necessary to cover the 90% of 
the (validation) data for the seven runs. 

From the above results, we note that the models that use vertical distance residuals need, in general larger 
strips to cover the 90% of the points. The strips are particulary large for the MEDIAN criterion, where the 
widest strips were obtained. This conclusion is justified since the quantile criteria accommodate a single point, 
but do not take into account the deviations to the remainder elements in the data (apart from the ordering in 
the residuals). Also, for the same reason, the conservative MAX criterion makes the models to require wider 
strips. The main observed difference between the MEDIAN and the MAX criteria is that whereas the behavior 
(in term of the fitting strips) of the MAX criterion is similar for the six choices of residuals, the MEDIAN gets 
very different results depending of the chosen residual. The most robust residuals, based on the smallest range 
between the maximum and minimum length of the strips, are the £\, £ 1 , 5 , and £3; while with the same measure 
of robustness, the fc-centrum criterion gets the best results. 

To illustrate the quality of the optimal hyperplanes, in Figure|n]we show the values of the consumptions with 
respect to the actual consumptions for the first random fold in the experiments (in the validation sample that 
was not used to compute the hyperplanes). 
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Table 6. Results for Experiments for d = 4 and corrupting the X variables. 



V 



SUM 

0 

GCoD 

% 

e 90 

(8.7754, 0.2361, 0.1242, -0.0645, 1) 
0.0369 

8% 

285.1339 

(-167.9861, 32.8678, -11.1472, -15.3593, 1) 
0.3527 

9% 

172.616 

(19.6624, 1.9411, 1.4336, -2.6949, 1) 
0.7030 

15% 

166.2396 

MAX 

0 

GCoD 

% 

e 90 

(11.2676, -0.8055, 0.4093, 0.3802, 1) 
0.1200 

2% 

243.9038 

(95.4943, -2.3074, -2.7088, 4.5984, 1) 
0.5037 

9% 

160.86 

(76.9688, -2.1455, -2.9597, 4.6480, 1) 
0.7852 

6% 

164.3572 

SOS 

0 

GCoD 

% 

e 90 

(2.7637, 0.1306, 0.06391, -0.0111, 1) 
0.0409 

6% 

285.0815 

(-35.0079, -17.4180, 5.1138, 8.8243, -1) 
0.5787 

9% 

170.37 

(14.4492, 2.3985, 1.8254, -3.4712, 1) 
0.9085 

8% 

165.6255 

1.5SUM 

0 

GCoD 

% 

e 90 

(3.1382, 0.1714, 0.0663, -0.03521) 
0.0418 

7% 

282.7383 

(21.9152, -18.9245, 5.5144, 9.6284, -1) 
0.4776 

8% 

167.7096 

(-20.1562, -2.0728, -1.5407, 2.9444, -1) 
0.8349 

14% 

165.9725 

kC 

0 

GCoD 

% 

e 90 

(-6.8937, 0.1108, 0.0744, -0.0183, 1) 
0.0258 

8% 

276.4327 

(-34.1432, -15.4977, 4.3066, 7.9523, -1) 
0.3487 

8% 

168.3023 

(5.0421, 2.0898, 1.4381, -2.8638, 1) 
0.6984 

15% 

169.65 

AkC 

0 

GCoD 

% 

e 90 

(-29.5486, 0.5489, 0.2119, 0.2342, 1) 
0.1544 

12% 

304.1316 

(11.5813, 2.8055, -0.1579, 0.1805, 1) 
0.8716 

5% 

306.9669 

(2.7269, 1.0225, 0.9985, 1.0072, 1) 
0.9950 

82% 

496.6216 

MED 

0 

GCoD 

% 

e 90 

(11.3163, 0.5095, 0.5018, 0.0667, 1) 
0.3706 

9% 

283.331 

(15.2913, -1.38181, -0.1062, 9.6624, 1) 
0.8308 

11% 

251.5948 

(2.3001, 1.0447, 1.0149, 1.0033, 1) 
0.9941 

80% 

497.3323 



^1.5 

r 2 


SUM 

0 

GCoD 

% 

e 90 

(-25.3339, 7.2803, 0.3850, -6.5208, 1) 
0.3973 

12% 

167.1534 

(-25.3339, 7.2803, 0.3850, -6.5208, 1) 
0.4630 

12% 

167.1534 

(-48.9741, -2.5251, -1.5173, 3.4889, -1) 
0.5446 

11% 

163.8287 

MAX 

0 

GCoD 

% 

e 90 

(-76.9688, 2.1455, 2.9597, -4.6480, -1) 
0.5510345 

6% 

164.3572 

(-76.9688, 2.1455, 2.9597, -4.6480, -1) 
0.6096547 

6% 

164.3572 

(-76.9688, 2.1455, 2.9597, -4.6480, -1) 
0.677138 

6% 

164.3572 

SOS 

0 

GCoD 

% 

e 90 

(-19.8365, -24.1780, -1.6843, 23.0309, -1) 
0.6391 

9% 

159.013 

(-37.1798, -20.6518, -4.8914, 22.4924, -1) 
0.7149 

9% 

160.1321 

(16.2930, 4.1351, 2.2042, -5.3890, 1) 
0.7921 

4% 

165.3201 

1.5SUM 

0 

GCoD 

% 

e 90 

(27.4692, 14.0582, 1.0081, -12.9659, 1) 
0.5314 

10% 

162.8882 

(27.4555, 14.0608, 1.0082, -12.9683, 1) 
0.6059 

10% 

162.8875 

(-20.4048, -3.2308, -1.6763, 4.1796, -1) 
0.6909 

5% 

164.1443 

kC 

0 

GCoD 

% 

e 90 

(31.8219, 41.5015, -5.2288, -30.4070, 1) 
0.3916 

5% 

165.793 

(2.4227, 14.3655, 4.4768, -15.4827, 1) 
0.4629 

7% 

168.1855 

(6.6713, -3.7849, -1.5627,4.3751, -1) 
0.5440 

4% 

165.9668 

AkC 

0 

GCoD 

% 

e 90 

(7.9530, -1.6065, 0.3482, 0.8960, -1) 
0.7403 

7% 

180.9401 

(-25.2618, -1.0371, -1.4553, 0.7368, -1) 
0.8148 

11% 

244.0442 

(40.7617, -1.6662, -0.5106, 0.5624, -1) 
0.8817 

9% 

231.9954 

MED 

0 

GCoD 

% 

e 90 

(-28.1536, -1.9062, -0.5785, 0.5246, -1) 
0.8278 

9% 

237.8898 

(-51.5261, 1.9897, 1.0285, -0.5282, 1) 
0.8575 

8% 

305.539 

(6.9522, 1.2873, 1.0511, -0.1044, 1) 
0.8941 

14% 

350.0691 


The conclusions are that the vertical distance residuals do not fit well to the actual the trend of the validation 
data. The same conclusion also applies to the models that use the MAX criterion or £<*> residuals. On the other 
hand, the td-residual models seem to fit quite well to the data, whereas the £ T -residual models have similar 
(good) behavior. As expected the kC and AkC criteria, which are known to be very robust, actually capture 
the main information about the trend of the data. 

6.2.2. Complete models. We also performed the same experiments for the whole data set. The three variables 
X\ (incomes), X 2 (prices) and X3 (consumptions) are now considered. The optimal hyperplanes are shown in 
Table [Tl] (since the coefficients are non zero they were divided by —/?3 to make easier the interpretation and 
representations of the models as X3 = /3 0 + P\X\ + ^ 2 X 2 ). 

The summary of the results of the fc-fold cross validation scheme (where the data set was partitioned exactly 
as in the bivariate case) is shown in Table IT2l Finally, Figure [7] shows the values of the consumptions with respect 
to the actual consumptions for the first random fold in the experiments. From the results, one can observe that 
including all the variables in the model reduces the differences among the models obtained with the different 
methods. In this case, the consumption seems to be well linearly described by the incomes and prices. This 
conclusion is supported both by the projection and by the summary of k-cross validation experiments. The 
exceptionally bad performance of the MAX criterion in the former case (the model that only included X\ and 
A3), is now as good as the rest of the criteria. In addition, the inclusion of prices in the model fixes the, in 
most cases, senseless signs of the coefficients in the simple models in Table [9] One can observe that in those 
cases an increase of the incomes would predict a decrease of the consumptions. This unusual trend is fixed by 
introducing the prices in the complete model. 
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Table 7. Results for Experiments for d = 4 and corrupting the Y variables. 



V 


TZ 

SUM 

0 

GCoD 

% 

e 90 

(1.9468, 0.9648, 0.9899, 1.0058, 1) 

0.5999 

78% 

123.5456 

(-1.9158, -1.1083, -0.8751, -3.3186, -1) 
0.6538 

14% 

149.6274 

(1.6655, -1.0083, -1.0530, -1.0446, -1) 
0.9006 

76% 

121.8106 

MAX 

0 

GCoD 

% 

e 90 

(1 - 04.7766, -1.0780, -2.8506, -0.8355, -1) 
0.3357 

12% 

151.6067 

(120.6153, -1.4207, -5.5268, -0.7782, -1) 
0.8267 

7% 

147.4952 

(54.3395, 2.3207, 6.0411, 3.4977, 1) 
0.9078 

12% 

138.4277 

SOS 

0 

GCoD 

% 

e 90 

(-12.1432, -0.8507, -1.0758, -1.1049, -1) 
0.4247 

45% 

124.0456 

(25.1165, -1.2149, -5.4326, -1.1199, -1) 
0.9015 

13% 

135.9287 

(-5.4787, -1.8048, -2.3397, -2.0389, -1) 
0.9801 

15% 

102.1587 

1.5SUM 

0 

GCoD 

% 

e 90 

(-2.1265, -0.9557, -0.9984, -1.0235, -1) 
0.5106 

77% 

124.3694 

(34.3751, -1.0783, -5.2458, -1.0619, -1) 
0.8044 

11% 

139.4734 

(-0.6651, -1.3869, -1.5549, -1.5790, -1) 
0.9485 

22% 

95.54551 

kC 

0 

GCoD 

% 

e 90 

(-0.3095, -0.9816, -1.0017, -1.009643, -1) 
0.5275 

80% 

123.0891 

(2.1980, -0.8680, -0.9950, -3.4086, -1) 
0.6525 

10% 

145.6142 

(-0.6929, -1.0211, -1.0606, -1.0666, -1) 
0.8835 

74% 

120.8033 

AkC 

0 

GCoD 

% 

e 90 

(-7.2126, -0.9981, -1.2345, -0.9988, -1) 
0.8785 

57% 

105.7586 

(-1.7307, -0.9801, -1.0396, -1.0121, -1) 
0.9933 

77% 

120.4785 

(0.1128, -0.9847, -1.0149, -1.0013, -1) 
0.9981 

80% 

121.9634 

MED 

0 

GCoD 

% 

e 90 

(-8.4437, -1.0328, -1.1891, -0.9958, -1) 
0.9011 

58% 

105.9371 

(-3.0605, -0.9660 - 1.0175, -1.0366, -1) 
0.9921 

76% 

123.0289 

(-1.7471, -0.9713, -0.9881, -1.0144, -1) 
0.9980 

79% 

123.8959 



^1.5 



SUM 

0 

GCoD 

% 

e 90 

(0.5934, -1.0202, -1.0588, -1.0264, -1) 
0.7489 

80% 

119.4431 

(0.6616, -1.0203, -1.0584, -1.0270, -1) 
0.8006 

80% 

119.5293 

(0.9775, -1.0098, -1.0563, -1.0343, -1) 
0.8418 

78% 

120.6788 

MAX 

0 

GCoD 

% 

e 90 

(120.6153, -1.4207, -5.5268, -0.7782, -1) 
0.8267 

7% 

147.4952 

(-54.3395, -2.3207, -6.0411, -3.4977, -1) 
0.8384 

12% 

138.4277 

(-54.3395, -2.3207, -6.0411, -3.4977, -1) 
0.8643 

12% 

138.4277 

SOS 

0 

GCoD 

% 

e 90 

(-14.4853, 1.5436, 4.4201, 1.5950, 1) 
0.9022 

13% 

131.3351 

(-0.3904, 1.7361, 2.9264, 2.0617, 1) 
0.9272 

10% 

114.7621 

(4.7620, 1.9721, 2.5444, 2.0415, 1) 
0.9514 

12% 

106.4697 

1.5SUM 

0 

GCoD 

% 

e 90 

(15.7120, -1.1641, -2.6186, -1.8366, -1) 
0.8079 

21% 

114.939 

(-0.8627, -1.4497, -1.6239, -1.9098, -1) 
0.8565 

22% 

97.67539 

(-0.6434, -1.4056, -1.5798, -1.5348, -1) 
0.8965 

20% 

97.29497 

kC 

0 

GCoD 

% 

e 90 

(-1.0976, -1.0234, -1.0643, -1.0656, -1) 
0.7053 

74% 

120.25 

(-1.0942, -1.0234, -1.0641, -1.0656, -1) 
0.7661 

74% 

120.262 

(-0.7613, -1.0216, -1.0617, -1.0665, -1) 
0.8144 

74% 

120.6901 

AkC 

0 

GCoD 

% 

e 90 

(0.8072, -0.9319, -1.1111, -1.0901, -1) 
0.9929 

64% 

124.0139 

(-1.5573, -0.9672, -0.9991, -1.0184, -1) 
0.9954 

77% 

123.7847 

(2.4443, -1.0165, -0.9923, -1.0147, -1) 
0.9930 

82% 

123.5452 

MED 

0 

GCoD 

% 

€ 90 

(-0.6735, -0.9887, -1.0180, -0.9497, -1) 
0.9945 

75% 

118.3319 

(0.4156, -0.9995, -1.0147, -1.0116, -1) 
0.9949 

81% 

121.9701 

(-1.1572, -0.9753, -1.0309, -0.9853, -1) 
0.9964 

78% 

120.0091 


Table 8. Estimations for the bidimensional Durbin-Watson’s dataset. 



v li IZ 

(4.0898,-1.1454,-1) (10.8840,-4.6184,-1) (8.9764,-3.6797,-1) 

(1.6986,-0.0196,-1) (1.6986,-0.0196,-1) (-0.5963,1.1530,-1) 

(2.9993,-0.6309,-1) (13.5934,-6.0703,-1) (7.0978,-2.7353,-1) 

(4.0730,-1.1566,-1) (10.6113,-4.5067,-1) (7.9926,-3.1851,-1) 

(5.5288,-1.9236,-1) (8.7033,-3.5303,-1) (7.6654,-2.9977,-1) 

(2.7467,-0.4031,-1) (17.1272,-7.6311,-1) (18.4349,-8.2833,-1) 

(2.4167, -0.2310, -1) (28.0156, -13.0469, -1) (23.4462, -10.7748, -1) 



(10.8840, 

-4.6184, 

-i) 

(10.8746, 

-4.6138, 

-1) 

(9.8917, - 

-4.1344, 

-i) 

(1.6986, - 

-0.0196, 

-i) 

(-0.5963 

1, 1.1530, 

-1) 

(-0.5963, 

1.1530, 

-i) 

(13.1400, 

-5.8376, 

-i) 

(10.9561, 

-4.7162, 

-1) 

(8.7832, - 

-3.6006, 

-i) 

(10.4466, 

-4.4233, 

-i) 

(9.6868, - 

-4.0399, 

-1) 

(8.9821, - 

-3.6851, 

-i) 

(8.0130, - 

-3.1750, ■ 

-i) 

(8.0455, - 

-3.1914, 

-1) 

(8.5389, - 

-3.4427, 

-i) 

(13.9827, 

-6.0670, 

-i) 

(21.0745, 

-9.6064, 

-1) 

(20.6955, - 

-9.4349, 

-i) 

(24.0656, - 

-11.0819, 

. -i) 

(6.4510, 

-2.4601, 

-1) 

(28.0150, - 

-13.0466 

,-i) 


Table 9. Marginal variations for each of the models. 
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Figure 5. 


Estimated lines for the data in [16] . 






Table 10. Summary of k-fold cross validations experiments for the bidimensional 

Durbin-Watson’s dataset. 



V 



^1.5 


*3 


minego 

0.1590 

0.0560 

0.0702 

0.0491 

0.0459 

0.0560 

SUM 

max egg 

0.3049 

0.1645 

0.1444 

0.1477 

0.1480 

0.1480 

medianegg 

0.2366 

0.0983 

0.0923 

0.0881 

0.0828 

0.0983 


^90 

0.2330 

0.1027 

0.0982 

0.0958 

0.0959 

0.1021 


min eg 0 

0.1262 

0.1274 

0.1262 

0.1262 

0.1262 

0.1274 

MAX 

max ego 

0.3955 

0.3955 

0.3663 

0.3663 

0.3663 

0.3955 

medianegg 

0.3664 

0.3664 

0.3621 

0.3621 

0.3621 

0.3664 


£90 

0.3337 

0.3338 

0.3222 

0.3222 

0.3222 

0.3338 


min ego 

0.1372 

0.0844 

0.0566 

0.0568 

0.0633 

0.0793 

SOS 

max ego 

0.4072 

0.1264 

0.1163 

0.1202 

0.1235 

0.1253 

medianegg 

0.2878 

0.0962 

0.0983 

0.0879 

0.0961 

0.0961 


£90 

0.2980 

0.1005 

0.0973 

0.0900 

0.0905 

0.0983 


min ego 

0.1437 

0.0476 

0.0488 

0.0524 

0.0499 

0.0478 

1.5SUM 

max ego 

0.3091 

0.1353 

0.1199 

0.1254 

0.1308 

0.1334 

medianegg 

0.2260 

0.0834 

0.0852 

0.0910 

0.0885 

0.0841 


£90 

0.2349 

0.0922 

0.0872 

0.0869 

0.0884 

0.0917 


min ego 

0.1236 

0.0414 

0.0655 

0.0495 

0.0480 

0.0412 

kC 

max ego 

0.2843 

0.1220 

0.1147 

0.1163 

0.1185 

0.1219 

medianegg 

0.1281 

0.0837 

0.0837 

0.0851 

0.0851 

0.0855 


£90 

0.1511 

0.0827 

0.0834 

0.0800 

0.0809 

0.0821 


min ego 

0.4482 

0.0421 

0.0429 

0.0367 

0.0892 

0.0484 

akC 

max ego 

0.6677 

0.2039 

0.1853 

0.2122 

0.4654 

0.1981 

medianegg 

0.5162 

0.1722 

0.1296 

0.1605 

0.1534 

0.1466 


£90 

0.5282 

0.1434 

0.1338 

0.1417 

0.1914 

0.1373 


min ego 

0.4275 

0.1182 

0.1147 

0.0979 

0.1182 

0.0615 

MED 

max ego 

0.6375 

0.2170 

0.4612 

0.2203 

0.2137 

0.2101 

medianegg 

0.5503 

0.1712 

0.1761 

0.1701 

0.1393 

0.1565 


£90 

0.5406 

0.1651 

0.2093 

0.1614 

0.1501 

0.1478 


Table 11. Estimations for the Durbin-Watson’s dataset. 


v ~ 


SUM 

(4.4817, 0.0696, 

-1.3374, 

-i) 

(4.555, 0.0587, 

-1.3623, 

=iT 

MAX 

(4.5227, 0.0646, 

-1.3519, 

-i) 

(4.6159, -0.013 

-1.3273, 

-1) 

SOS 

(3.9725, 0.0331, 

-1.0692, 

-i) 

(4.404, 0.1369, 

-1.3881. 

-1) 

1.5SUM 

(4.404, 0.1369, 

-1.3881, 

-i) 

(4.404, 0.1369, 

-1.3881, 

-1) 

kC 

(4.4159, 0.0288, 

-1.2753, 

-i) 

(4.4905, 0.0635, 

-1.3425, 

-1) 

AkC 

(4.4355, 0.0655, 

-1.3183, 

-i) 

(4.4521, 0.0585, 

-1.3197, 

-1) 

MED 

(4.4288, 0.0488, 

-1.2979, 

-i) 

(4.5075, 0.0634, 

-1.3476, 

-1) 


- - - 

(4.1367, 0.3502, -1.4305, -1) 
(4.1355, 0.5086, -1.5758, -1) 
(4.404, 0.1369, -1.3881, -1) 
(4.404, 0.1369, -1.3881, -1) 
(4.3334, 0.1325, -1.3317, -1) 
(4.4688, 0.0535, -1.323, -1) 
(4.3559, 0.1431, -1.3489, -1) 



^1.5 

SUM 

(4.4445, 0.0698, -1.3242, -1) 

MAX 

(4.4155, 0.0352, -1.2797, -1) 

SOS 

(4.3498, 0.1131, -1.3201, -1) 

1.5SUM 

(4.2123, 0.4308, -1.5386, -1) 

kC 

(5.2647, -0.6758, -1.0312, -1) 

AkC 

(4.1061, 0.5015, -1.551, -1) 

MED 

(4.3576, 0.2689, -1.4559, -1) 


^ 2 

(4.472, 0.0633, -1.331, -1) 


(4.3938, 0.1107, -1.3377, -1) 
(4.3498, 0.1131, -1.3201, -1) 
(4.0853, 0.4429, -1.4891, -1) 
(3.5719, 1.1094, -1.8642, -1) 
(4.1579, 0.467, -1.5434, -1) 
(4.0772, 0.4066, -1.4415, -1) 


(4.4922, 0.0619, -1.3386, -1) 
(4.2655, 0.1691, -1.3326, -1) 
(4.3498, 0.1131, -1.3201, -1) 
(3.6048, 0.7761, -1.5744, -1) 
(3.4912, 1.0623, -1.7796, -1) 
(4.2963, 0.3239, -1.4761, -1) 
(76.3635, 25.0913, -61.4268, -1) 
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Figure 6 . Responses in the dependent variable by residuals for the bivariate case (SUM: red, 
MAX: blue, SOS: green, 1.5SUM: yellow, kC: black, AkC: orange, MEDIAN: gray) . 



Table 12. Summary of k-fold cross validations experiments for the Durbin-Watson’s dataset. 



V 

^1 


h.s 

4 

4 


mine 90 

0.0369 

0.0388 

0.0315 

0.0380 

0.0346 

0.0347 

SUM 

max egg 

0.0735 

0.0741 

0.0832 

0.0743 

0.0743 

0.0732 

medianegg 

0.0629 

0.0627 

0.0647 

0.0625 

0.0625 

0.0626 


e 90 

0.0573 

0.0598 

0.0616 

0.0580 

0.0567 

0.0593 


min eg 0 

0.0562 

0.0515 

0.0515 

0.0515 

0.0515 

0.0515 

MAX 

max ego 

0.0807 

0.0762 

0.0760 

0.0760 

0.0760 

0.0762 

medianegg 

0.0701 

0.0607 

0.0644 

0.0644 

0.0607 

0.0607 


e 90 

0.0678 

0.0624 

0.0641 

0.0641 

0.0624 

0.0624 


min ego 

0.0255 

0.0362 

0.0310 

0.0321 

0.0327 

0.0327 

SOS 

max ego 

0.0656 

0.0683 

0.0691 

0.0678 

0.0675 

0.0675 

medianegg 

0.0586 

0.0583 

0.0568 

0.0586 

0.0581 

0.0582 


e 90 

0.0547 

0.0541 

0.0537 

0.0543 

0.0528 

0.0529 


min ego 

0.0262 

0.0342 

0.0292 

0.0308 

0.0314 

0.0316 

1.5SUM 

max ego 

0.0685 

0.0709 

0.0713 

0.0691 

0.0703 

0.0703 

medianegg 

0.0617 

0.0563 

0.0587 

0.0559 

0.0556 

0.0558 


e 90 

0.0553 

0.0547 

0.0546 

0.0527 

0.0531 

0.0532 


min ego 

0.0269 

0.0368 

0.0265 

0.0251 

0.0272 

0.0272 

kC 

max ego 

0.0650 

0.0700 

0.0698 

0.0709 

0.0709 

0.0700 

medianegg 

0.0588 

0.0564 

0.0559 

0.0559 

0.0569 

0.0571 


e 90 

0.0514 

0.0549 

0.0536 

0.0534 

0.0538 

0.0535 


min eg 0 

0.0349 

0.0338 

0.0360 

0.0305 

0.0256 

0.0604 

akC 

max ego 

0.1042 

0.1041 

0.1017 

0.3524 

0.1100 

0.1303 

medianegg 

0.0906 

0.0888 

0.0820 

0.0885 

0.0676 

0.0931 


e 90 

0.0815 

0.0799 

0.0778 

0.1115 

0.0713 

0.0923 


min eg 0 

0.0342 

0.0329 

0.0346 

0.0332 

0.0429 

0.0270 

MED 

max ego 

0.1064 

0.0994 

0.0997 

0.1102 

0.3410 

0.3266 

medianegg 

0.0709 

0.0872 

0.0894 

0.0649 

0.0844 

0.0714 


e 90 

0.0738 

0.0784 

0.0794 

0.0671 

0.1215 

0.1012 


7. Conclusions and Further Research 

This paper introduces a new framework for fitting hyperplanes to a given set of points by considering distance- 
based residuals and applying generalized ordered weighted averaging aggregation criteria. Mathematical pro¬ 
gramming formulations are proposed for those models and some properties are proven. Two important particular 
cases of residuals are analyzed in more detail, namely those induced by block norms or t T norms for r > 1. A 
new goodness of fitting measure is also introduced for this framework, which extends the classical coefficient of 
determination in least sum of squares fitting with vertical distances. Extensive computational experiments run 
in Gurobi under R are reported in order to illustrate and validate the new methodology for computing optimal 
fitting hyperplanes. 

The results in this paper admit some extensions applying similar tools. Among them we mention regular¬ 
ization adding constraints to overcome ill-posed data set, the simultaneous computation of several (more than 
one) hyperplanes to a given data set such that each single point is “allocated” to its closest model. This ap¬ 
proach would allow to analyze structural changes on the behavior of the data (in different periods of time or for 
different values of one of the variables). The main, non trivial, difference between those models and the ones 
proposed in this paper is analogous to that that exists between the so-called single-facility and multifacility 
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Figure 7. Responses in the dependent variable by residuals for the d = 3 case (SUM: red, 
MAX: blue, SOS: green, 1.5SUM: yellow, kC: black, AkC: orange, MEDIAN: gray) . 








location problems (see [32]). It is well-known that multifacility problems become easily hard even if the single¬ 
facility case were easy. Hence, although very interesting, the above extension needs further analysis. Another 
interesting extension is the use of mathematical programming tools to fit hyperplanes to binary data. The usual 
techniques to estimate those models are based on likelihood estimation since least squares estimation is known 
to get no desirable results on this type of data. Here our proposal will fit in a natural way and will deserve 
further attention. 
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